Real-world data rarely come clean. Using Python and its libraries, we will gather data from a variety of sources and in a variety of formats, assess its quality and tidiness, then clean it. This is called data wrangling. We will document our wrangling efforts in a Jupyter Notebook, plus showcase them through analyses and visualizations using Python its libraries.
The dataset that we will be wrangling (and analyzing and visualizing) is the tweet archive of Twitter user @dog_rates, also known as WeRateDogs. WeRateDogs is a Twitter account that rates people's dogs with a humorous comment about the dog. These ratings almost always have a denominator of 10. The numerators, though? Almost always greater than 10. 11/10, 12/10, 13/10, etc. Why? Because "they're good dogs, Brent". WeRateDogs has over 4 million followers and has received international media coverage.
Software that we will be used
Since we work in a local environment, the following libraries should be installed:
Context
Goal: wrangle WeRateDogs Twitter data to create interesting and trustworthy analyses and visualizations.
The Data
Enhanced Twitter Archive
The WeRateDogs Twitter archive contains basic tweet data for all 2356 of their tweets. Containing one column the archive does contain though: each tweet's text, which Udacity team has extracted the rating, dog name, and dog "stage" (i.e. doggo, floofer, pupper, and puppo) to make this Twitter archive "enhanced".
Additional Data via the Twitter API
Then we need retweet count and favorite count are two of the notable column omissions. Fortunately, this additional data can be gathered by anyone from Twitter's API. Using this API we can extract needed data to make our dataset more concise.
Image Predictions File
The Udacity team has run every image in the WeRateDogs Twitter archive through a neural network that can classify breeds of dogs. The results are so amazing: a table full of image predictions (the top three only) alongside each tweet ID, image URL, and the image number that corresponded to the most confident prediction.
Project Details
Gathering data
Assessing data
Cleaning data
1) your data wrangling efforts and
2) your data analyses and visualizations
The archive data is downloaded manually from the Udacity lesson's page, then we will be inserted using Pandas libraries.
This data is hosted on Udacity's servers and should be downloaded programmatically using the Requests library and the following URL: https://d17h27t6h515a5.cloudfront.net/topher/2017/August/599fd2ad_image-predictions/image-predictions.tsv.
For this data we will be using TwitterAPI and Tweepy library. Using the tweet IDs in the WeRateDogs Twitter archive, query the Twitter API for each tweet's JSON data using Python's Tweepy library and store each tweet's entire set of JSON data in a file called tweet_json.txt file. Each tweet's JSON data should be written to its line. Then read this .txt file line by line into a pandas DataFrame with (at minimum) tweet ID, retweet count, and favorite count.
As usual, we need to import useful packages before doing anything in this project.
import os
import re
import json
import tweepy
import requests
import numpy as np
import pandas as pd
import seaborn as sns
from PIL import Image
from io import BytesIO
from tweepy import OAuthHandler
import matplotlib.pyplot as plt
from timeit import default_timer as timer
This was data in our hand right now.
archive_df = pd.read_csv('twitter-archive-enhanced.csv')
archive_df
| tweet_id | in_reply_to_status_id | in_reply_to_user_id | timestamp | source | text | retweeted_status_id | retweeted_status_user_id | retweeted_status_timestamp | expanded_urls | rating_numerator | rating_denominator | name | doggo | floofer | pupper | puppo | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 892420643555336193 | NaN | NaN | 2017-08-01 16:23:56 +0000 | <a href="http://twitter.com/download/iphone" r... | This is Phineas. He's a mystical boy. Only eve... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/892420643... | 13 | 10 | Phineas | None | None | None | None |
| 1 | 892177421306343426 | NaN | NaN | 2017-08-01 00:17:27 +0000 | <a href="http://twitter.com/download/iphone" r... | This is Tilly. She's just checking pup on you.... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/892177421... | 13 | 10 | Tilly | None | None | None | None |
| 2 | 891815181378084864 | NaN | NaN | 2017-07-31 00:18:03 +0000 | <a href="http://twitter.com/download/iphone" r... | This is Archie. He is a rare Norwegian Pouncin... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/891815181... | 12 | 10 | Archie | None | None | None | None |
| 3 | 891689557279858688 | NaN | NaN | 2017-07-30 15:58:51 +0000 | <a href="http://twitter.com/download/iphone" r... | This is Darla. She commenced a snooze mid meal... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/891689557... | 13 | 10 | Darla | None | None | None | None |
| 4 | 891327558926688256 | NaN | NaN | 2017-07-29 16:00:24 +0000 | <a href="http://twitter.com/download/iphone" r... | This is Franklin. He would like you to stop ca... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/891327558... | 12 | 10 | Franklin | None | None | None | None |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 2351 | 666049248165822465 | NaN | NaN | 2015-11-16 00:24:50 +0000 | <a href="http://twitter.com/download/iphone" r... | Here we have a 1949 1st generation vulpix. Enj... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666049248... | 5 | 10 | None | None | None | None | None |
| 2352 | 666044226329800704 | NaN | NaN | 2015-11-16 00:04:52 +0000 | <a href="http://twitter.com/download/iphone" r... | This is a purebred Piers Morgan. Loves to Netf... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666044226... | 6 | 10 | a | None | None | None | None |
| 2353 | 666033412701032449 | NaN | NaN | 2015-11-15 23:21:54 +0000 | <a href="http://twitter.com/download/iphone" r... | Here is a very happy pup. Big fan of well-main... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666033412... | 9 | 10 | a | None | None | None | None |
| 2354 | 666029285002620928 | NaN | NaN | 2015-11-15 23:05:30 +0000 | <a href="http://twitter.com/download/iphone" r... | This is a western brown Mitsubishi terrier. Up... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666029285... | 7 | 10 | a | None | None | None | None |
| 2355 | 666020888022790149 | NaN | NaN | 2015-11-15 22:32:08 +0000 | <a href="http://twitter.com/download/iphone" r... | Here we have a Japanese Irish Setter. Lost eye... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666020888... | 8 | 10 | None | None | None | None | None |
2356 rows × 17 columns
The tweet image predictions, i.e., what breed of dog (or other object, animal, etc.) is present in each tweet according to a neural network. This file (image_predictions.tsv) is hosted on Udacity's servers and should be downloaded programmatically.
url = 'https://d17h27t6h515a5.cloudfront.net/topher/2017/August/599fd2ad_image-predictions/image-predictions.tsv'
r = requests.get(url)
with open('image-predictions.tsv', 'wb') as f:
f.write(r.content)
image_df = pd.read_csv('image-predictions.tsv', sep='\t')
image_df
| tweet_id | jpg_url | img_num | p1 | p1_conf | p1_dog | p2 | p2_conf | p2_dog | p3 | p3_conf | p3_dog | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 666020888022790149 | https://pbs.twimg.com/media/CT4udn0WwAA0aMy.jpg | 1 | Welsh_springer_spaniel | 0.465074 | True | collie | 0.156665 | True | Shetland_sheepdog | 0.061428 | True |
| 1 | 666029285002620928 | https://pbs.twimg.com/media/CT42GRgUYAA5iDo.jpg | 1 | redbone | 0.506826 | True | miniature_pinscher | 0.074192 | True | Rhodesian_ridgeback | 0.072010 | True |
| 2 | 666033412701032449 | https://pbs.twimg.com/media/CT4521TWwAEvMyu.jpg | 1 | German_shepherd | 0.596461 | True | malinois | 0.138584 | True | bloodhound | 0.116197 | True |
| 3 | 666044226329800704 | https://pbs.twimg.com/media/CT5Dr8HUEAA-lEu.jpg | 1 | Rhodesian_ridgeback | 0.408143 | True | redbone | 0.360687 | True | miniature_pinscher | 0.222752 | True |
| 4 | 666049248165822465 | https://pbs.twimg.com/media/CT5IQmsXIAAKY4A.jpg | 1 | miniature_pinscher | 0.560311 | True | Rottweiler | 0.243682 | True | Doberman | 0.154629 | True |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 2070 | 891327558926688256 | https://pbs.twimg.com/media/DF6hr6BUMAAzZgT.jpg | 2 | basset | 0.555712 | True | English_springer | 0.225770 | True | German_short-haired_pointer | 0.175219 | True |
| 2071 | 891689557279858688 | https://pbs.twimg.com/media/DF_q7IAWsAEuuN8.jpg | 1 | paper_towel | 0.170278 | False | Labrador_retriever | 0.168086 | True | spatula | 0.040836 | False |
| 2072 | 891815181378084864 | https://pbs.twimg.com/media/DGBdLU1WsAANxJ9.jpg | 1 | Chihuahua | 0.716012 | True | malamute | 0.078253 | True | kelpie | 0.031379 | True |
| 2073 | 892177421306343426 | https://pbs.twimg.com/media/DGGmoV4XsAAUL6n.jpg | 1 | Chihuahua | 0.323581 | True | Pekinese | 0.090647 | True | papillon | 0.068957 | True |
| 2074 | 892420643555336193 | https://pbs.twimg.com/media/DGKD1-bXoAAIAUK.jpg | 1 | orange | 0.097049 | False | bagel | 0.085851 | False | banana | 0.076110 | False |
2075 rows × 12 columns
Using the tweet IDs in the WeRateDogs Twitter archive, query the Twitter API for each tweet's JSON data using Python's Tweepy library and store each tweet's entire set of JSON data in a file called tweet_json.txt file. Each tweet's JSON data should be written to its own line. Then read this .txt file line by line into a pandas DataFrame with (at minimum) tweet ID, retweet count, and favorite count. Note: do not include your Twitter API keys, secrets, and tokens in your project submission.
# Query Twitter API for each tweet in the Twitter archive and save JSON in a text file
# These are hidden to comply with Twitter's API terms and conditions
consumer_key = ''
consumer_secret = ''
access_token = ''
access_secret = ''
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)
api = tweepy.API(auth, wait_on_rate_limit=True)
# NOTE TO STUDENT WITH MOBILE VERIFICATION ISSUES:
# df_1 is a DataFrame with the twitter_archive_enhanced.csv file. You may have to
# change line 17 to match the name of your DataFrame with twitter_archive_enhanced.csv
# NOTE TO REVIEWER: this student had mobile verification issues so the following
# Twitter API code was sent to this student from a Udacity instructor
# Tweet IDs for which to gather additional data via Twitter's API
tweet_ids = archive_df.tweet_id.values
len(tweet_ids)
# Query Twitter's API for JSON data for each tweet ID in the Twitter archive
count = 0
fails_dict = {}
start = timer()
# Save each tweet's returned JSON as a new line in a .txt file
with open('tweet_json.txt', 'w') as outfile:
# This loop will likely take 20-30 minutes to run because of Twitter's rate limit
for tweet_id in tweet_ids:
count += 1
print(str(count) + ": " + str(tweet_id))
try:
tweet = api.get_status(tweet_id, tweet_mode='extended')
print("Success")
json.dump(tweet._json, outfile)
outfile.write('\n')
except tweepy.TweepError as e:
print("Fail")
fails_dict[tweet_id] = e
pass
end = timer()
print(end - start)
print(fails_dict)
tweepy_df = pd.read_json("tweet_json.txt", lines=True)
tweepy_df
| created_at | id | id_str | full_text | truncated | display_text_range | entities | extended_entities | source | in_reply_to_status_id | ... | favorited | retweeted | possibly_sensitive | possibly_sensitive_appealable | lang | retweeted_status | quoted_status_id | quoted_status_id_str | quoted_status_permalink | quoted_status | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2017-08-01 16:23:56+00:00 | 892420643555336193 | 892420643555336192 | This is Phineas. He's a mystical boy. Only eve... | False | [0, 85] | {'hashtags': [], 'symbols': [], 'user_mentions... | {'media': [{'id': 892420639486877696, 'id_str'... | <a href="http://twitter.com/download/iphone" r... | NaN | ... | False | False | 0.0 | 0.0 | en | NaN | NaN | NaN | NaN | NaN |
| 1 | 2017-08-01 00:17:27+00:00 | 892177421306343426 | 892177421306343424 | This is Tilly. She's just checking pup on you.... | False | [0, 138] | {'hashtags': [], 'symbols': [], 'user_mentions... | {'media': [{'id': 892177413194625024, 'id_str'... | <a href="http://twitter.com/download/iphone" r... | NaN | ... | False | False | 0.0 | 0.0 | en | NaN | NaN | NaN | NaN | NaN |
| 2 | 2017-07-31 00:18:03+00:00 | 891815181378084864 | 891815181378084864 | This is Archie. He is a rare Norwegian Pouncin... | False | [0, 121] | {'hashtags': [], 'symbols': [], 'user_mentions... | {'media': [{'id': 891815175371796480, 'id_str'... | <a href="http://twitter.com/download/iphone" r... | NaN | ... | False | False | 0.0 | 0.0 | en | NaN | NaN | NaN | NaN | NaN |
| 3 | 2017-07-30 15:58:51+00:00 | 891689557279858688 | 891689557279858688 | This is Darla. She commenced a snooze mid meal... | False | [0, 79] | {'hashtags': [], 'symbols': [], 'user_mentions... | {'media': [{'id': 891689552724799489, 'id_str'... | <a href="http://twitter.com/download/iphone" r... | NaN | ... | False | False | 0.0 | 0.0 | en | NaN | NaN | NaN | NaN | NaN |
| 4 | 2017-07-29 16:00:24+00:00 | 891327558926688256 | 891327558926688256 | This is Franklin. He would like you to stop ca... | False | [0, 138] | {'hashtags': [{'text': 'BarkWeek', 'indices': ... | {'media': [{'id': 891327551943041024, 'id_str'... | <a href="http://twitter.com/download/iphone" r... | NaN | ... | False | False | 0.0 | 0.0 | en | NaN | NaN | NaN | NaN | NaN |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 2317 | 2015-11-16 00:24:50+00:00 | 666049248165822465 | 666049248165822464 | Here we have a 1949 1st generation vulpix. Enj... | False | [0, 120] | {'hashtags': [], 'symbols': [], 'user_mentions... | {'media': [{'id': 666049244999131136, 'id_str'... | <a href="http://twitter.com/download/iphone" r... | NaN | ... | False | False | 0.0 | 0.0 | en | NaN | NaN | NaN | NaN | NaN |
| 2318 | 2015-11-16 00:04:52+00:00 | 666044226329800704 | 666044226329800704 | This is a purebred Piers Morgan. Loves to Netf... | False | [0, 137] | {'hashtags': [], 'symbols': [], 'user_mentions... | {'media': [{'id': 666044217047650304, 'id_str'... | <a href="http://twitter.com/download/iphone" r... | NaN | ... | False | False | 0.0 | 0.0 | en | NaN | NaN | NaN | NaN | NaN |
| 2319 | 2015-11-15 23:21:54+00:00 | 666033412701032449 | 666033412701032448 | Here is a very happy pup. Big fan of well-main... | False | [0, 130] | {'hashtags': [], 'symbols': [], 'user_mentions... | {'media': [{'id': 666033409081393153, 'id_str'... | <a href="http://twitter.com/download/iphone" r... | NaN | ... | False | False | 0.0 | 0.0 | en | NaN | NaN | NaN | NaN | NaN |
| 2320 | 2015-11-15 23:05:30+00:00 | 666029285002620928 | 666029285002620928 | This is a western brown Mitsubishi terrier. Up... | False | [0, 139] | {'hashtags': [], 'symbols': [], 'user_mentions... | {'media': [{'id': 666029276303482880, 'id_str'... | <a href="http://twitter.com/download/iphone" r... | NaN | ... | False | False | 0.0 | 0.0 | en | NaN | NaN | NaN | NaN | NaN |
| 2321 | 2015-11-15 22:32:08+00:00 | 666020888022790149 | 666020888022790144 | Here we have a Japanese Irish Setter. Lost eye... | False | [0, 131] | {'hashtags': [], 'symbols': [], 'user_mentions... | {'media': [{'id': 666020881337073664, 'id_str'... | <a href="http://twitter.com/download/iphone" r... | NaN | ... | False | False | 0.0 | 0.0 | en | NaN | NaN | NaN | NaN | NaN |
2322 rows × 32 columns
In this step, we will be assessing them visually and programmatically for quality and tidiness issues using two types of assessment. We will be intensively using Pandas and its method, i.e:
.describe() to see the summary statistic.info() to see the data types each column and detect missing data.duplicates() to see if there is any duplicated rowKey Points
Key points in the data wrangling process for this project:
archive_df
| tweet_id | in_reply_to_status_id | in_reply_to_user_id | timestamp | source | text | retweeted_status_id | retweeted_status_user_id | retweeted_status_timestamp | expanded_urls | rating_numerator | rating_denominator | name | doggo | floofer | pupper | puppo | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 892420643555336193 | NaN | NaN | 2017-08-01 16:23:56 +0000 | <a href="http://twitter.com/download/iphone" r... | This is Phineas. He's a mystical boy. Only eve... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/892420643... | 13 | 10 | Phineas | None | None | None | None |
| 1 | 892177421306343426 | NaN | NaN | 2017-08-01 00:17:27 +0000 | <a href="http://twitter.com/download/iphone" r... | This is Tilly. She's just checking pup on you.... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/892177421... | 13 | 10 | Tilly | None | None | None | None |
| 2 | 891815181378084864 | NaN | NaN | 2017-07-31 00:18:03 +0000 | <a href="http://twitter.com/download/iphone" r... | This is Archie. He is a rare Norwegian Pouncin... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/891815181... | 12 | 10 | Archie | None | None | None | None |
| 3 | 891689557279858688 | NaN | NaN | 2017-07-30 15:58:51 +0000 | <a href="http://twitter.com/download/iphone" r... | This is Darla. She commenced a snooze mid meal... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/891689557... | 13 | 10 | Darla | None | None | None | None |
| 4 | 891327558926688256 | NaN | NaN | 2017-07-29 16:00:24 +0000 | <a href="http://twitter.com/download/iphone" r... | This is Franklin. He would like you to stop ca... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/891327558... | 12 | 10 | Franklin | None | None | None | None |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 2351 | 666049248165822465 | NaN | NaN | 2015-11-16 00:24:50 +0000 | <a href="http://twitter.com/download/iphone" r... | Here we have a 1949 1st generation vulpix. Enj... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666049248... | 5 | 10 | None | None | None | None | None |
| 2352 | 666044226329800704 | NaN | NaN | 2015-11-16 00:04:52 +0000 | <a href="http://twitter.com/download/iphone" r... | This is a purebred Piers Morgan. Loves to Netf... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666044226... | 6 | 10 | a | None | None | None | None |
| 2353 | 666033412701032449 | NaN | NaN | 2015-11-15 23:21:54 +0000 | <a href="http://twitter.com/download/iphone" r... | Here is a very happy pup. Big fan of well-main... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666033412... | 9 | 10 | a | None | None | None | None |
| 2354 | 666029285002620928 | NaN | NaN | 2015-11-15 23:05:30 +0000 | <a href="http://twitter.com/download/iphone" r... | This is a western brown Mitsubishi terrier. Up... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666029285... | 7 | 10 | a | None | None | None | None |
| 2355 | 666020888022790149 | NaN | NaN | 2015-11-15 22:32:08 +0000 | <a href="http://twitter.com/download/iphone" r... | Here we have a Japanese Irish Setter. Lost eye... | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666020888... | 8 | 10 | None | None | None | None | None |
2356 rows × 17 columns
archive_df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 2356 entries, 0 to 2355 Data columns (total 17 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 tweet_id 2356 non-null int64 1 in_reply_to_status_id 78 non-null float64 2 in_reply_to_user_id 78 non-null float64 3 timestamp 2356 non-null object 4 source 2356 non-null object 5 text 2356 non-null object 6 retweeted_status_id 181 non-null float64 7 retweeted_status_user_id 181 non-null float64 8 retweeted_status_timestamp 181 non-null object 9 expanded_urls 2297 non-null object 10 rating_numerator 2356 non-null int64 11 rating_denominator 2356 non-null int64 12 name 2356 non-null object 13 doggo 2356 non-null object 14 floofer 2356 non-null object 15 pupper 2356 non-null object 16 puppo 2356 non-null object dtypes: float64(4), int64(3), object(10) memory usage: 313.0+ KB
archive_df.loc[archive_df['retweeted_status_id'].notnull()]
| tweet_id | in_reply_to_status_id | in_reply_to_user_id | timestamp | source | text | retweeted_status_id | retweeted_status_user_id | retweeted_status_timestamp | expanded_urls | rating_numerator | rating_denominator | name | doggo | floofer | pupper | puppo | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 19 | 888202515573088257 | NaN | NaN | 2017-07-21 01:02:36 +0000 | <a href="http://twitter.com/download/iphone" r... | RT @dog_rates: This is Canela. She attempted s... | 8.874740e+17 | 4.196984e+09 | 2017-07-19 00:47:34 +0000 | https://twitter.com/dog_rates/status/887473957... | 13 | 10 | Canela | None | None | None | None |
| 32 | 886054160059072513 | NaN | NaN | 2017-07-15 02:45:48 +0000 | <a href="http://twitter.com/download/iphone" r... | RT @Athletics: 12/10 #BATP https://t.co/WxwJmv... | 8.860537e+17 | 1.960740e+07 | 2017-07-15 02:44:07 +0000 | https://twitter.com/dog_rates/status/886053434... | 12 | 10 | None | None | None | None | None |
| 36 | 885311592912609280 | NaN | NaN | 2017-07-13 01:35:06 +0000 | <a href="http://twitter.com/download/iphone" r... | RT @dog_rates: This is Lilly. She just paralle... | 8.305833e+17 | 4.196984e+09 | 2017-02-12 01:04:29 +0000 | https://twitter.com/dog_rates/status/830583320... | 13 | 10 | Lilly | None | None | None | None |
| 68 | 879130579576475649 | NaN | NaN | 2017-06-26 00:13:58 +0000 | <a href="http://twitter.com/download/iphone" r... | RT @dog_rates: This is Emmy. She was adopted t... | 8.780576e+17 | 4.196984e+09 | 2017-06-23 01:10:23 +0000 | https://twitter.com/dog_rates/status/878057613... | 14 | 10 | Emmy | None | None | None | None |
| 73 | 878404777348136964 | NaN | NaN | 2017-06-24 00:09:53 +0000 | <a href="http://twitter.com/download/iphone" r... | RT @dog_rates: Meet Shadow. In an attempt to r... | 8.782815e+17 | 4.196984e+09 | 2017-06-23 16:00:04 +0000 | https://www.gofundme.com/3yd6y1c,https://twitt... | 13 | 10 | Shadow | None | None | None | None |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 1023 | 746521445350707200 | NaN | NaN | 2016-06-25 01:52:36 +0000 | <a href="http://twitter.com/download/iphone" r... | RT @dog_rates: This is Shaggy. He knows exactl... | 6.678667e+17 | 4.196984e+09 | 2015-11-21 00:46:50 +0000 | https://twitter.com/dog_rates/status/667866724... | 10 | 10 | Shaggy | None | None | None | None |
| 1043 | 743835915802583040 | NaN | NaN | 2016-06-17 16:01:16 +0000 | <a href="http://twitter.com/download/iphone" r... | RT @dog_rates: Extremely intelligent dog here.... | 6.671383e+17 | 4.196984e+09 | 2015-11-19 00:32:12 +0000 | https://twitter.com/dog_rates/status/667138269... | 10 | 10 | None | None | None | None | None |
| 1242 | 711998809858043904 | NaN | NaN | 2016-03-21 19:31:59 +0000 | <a href="http://twitter.com/download/iphone" r... | RT @twitter: @dog_rates Awesome Tweet! 12/10. ... | 7.119983e+17 | 7.832140e+05 | 2016-03-21 19:29:52 +0000 | https://twitter.com/twitter/status/71199827977... | 12 | 10 | None | None | None | None | None |
| 2259 | 667550904950915073 | NaN | NaN | 2015-11-20 03:51:52 +0000 | <a href="http://twitter.com" rel="nofollow">Tw... | RT @dogratingrating: Exceptional talent. Origi... | 6.675487e+17 | 4.296832e+09 | 2015-11-20 03:43:06 +0000 | https://twitter.com/dogratingrating/status/667... | 12 | 10 | None | None | None | None | None |
| 2260 | 667550882905632768 | NaN | NaN | 2015-11-20 03:51:47 +0000 | <a href="http://twitter.com" rel="nofollow">Tw... | RT @dogratingrating: Unoriginal idea. Blatant ... | 6.675484e+17 | 4.296832e+09 | 2015-11-20 03:41:59 +0000 | https://twitter.com/dogratingrating/status/667... | 5 | 10 | None | None | None | None | None |
181 rows × 17 columns
archive_df.describe()
| tweet_id | in_reply_to_status_id | in_reply_to_user_id | retweeted_status_id | retweeted_status_user_id | rating_numerator | rating_denominator | |
|---|---|---|---|---|---|---|---|
| count | 2.356000e+03 | 7.800000e+01 | 7.800000e+01 | 1.810000e+02 | 1.810000e+02 | 2356.000000 | 2356.000000 |
| mean | 7.427716e+17 | 7.455079e+17 | 2.014171e+16 | 7.720400e+17 | 1.241698e+16 | 13.126486 | 10.455433 |
| std | 6.856705e+16 | 7.582492e+16 | 1.252797e+17 | 6.236928e+16 | 9.599254e+16 | 45.876648 | 6.745237 |
| min | 6.660209e+17 | 6.658147e+17 | 1.185634e+07 | 6.661041e+17 | 7.832140e+05 | 0.000000 | 0.000000 |
| 25% | 6.783989e+17 | 6.757419e+17 | 3.086374e+08 | 7.186315e+17 | 4.196984e+09 | 10.000000 | 10.000000 |
| 50% | 7.196279e+17 | 7.038708e+17 | 4.196984e+09 | 7.804657e+17 | 4.196984e+09 | 11.000000 | 10.000000 |
| 75% | 7.993373e+17 | 8.257804e+17 | 4.196984e+09 | 8.203146e+17 | 4.196984e+09 | 12.000000 | 10.000000 |
| max | 8.924206e+17 | 8.862664e+17 | 8.405479e+17 | 8.874740e+17 | 7.874618e+17 | 1776.000000 | 170.000000 |
# check numerator value counts
archive_df.rating_numerator.value_counts()
12 558 11 464 10 461 13 351 9 158 8 102 7 55 14 54 5 37 6 32 3 19 4 17 1 9 2 9 420 2 0 2 15 2 75 2 80 1 20 1 24 1 26 1 44 1 50 1 60 1 165 1 84 1 88 1 144 1 182 1 143 1 666 1 960 1 1776 1 17 1 27 1 45 1 99 1 121 1 204 1 Name: rating_numerator, dtype: int64
# check single numerator text value
single_numerator = archive_df.rating_numerator.value_counts().index[-22:]
single_numerator_index = []
for s in single_numerator:
row = archive_df.index[archive_df['rating_numerator'] == s].to_list()
single_numerator_index.append(row[0])
for s in single_numerator_index:
print(s, "\t", archive_df['text'][s], "\t",
archive_df['rating_numerator'][s])
1254 Here's a brigade of puppers. All look very prepared for whatever happens next. 80/80 https://t.co/0eb7R1Om12 80 1663 I'm aware that I could've said 20/16, but here at WeRateDogs we are very professional. An inconsistent rating scale is simply irresponsible 20 516 Meet Sam. She smiles 24/7 & secretly aspires to be a reindeer. Keep Sam smiling by clicking and sharing this link: https://t.co/98tB8y7y7t https://t.co/LouL5vdvxx 24 1712 Here we have uncovered an entire battalion of holiday puppers. Average of 11.26/10 https://t.co/eNm2S6p9BD 26 1433 Happy Wednesday here's a bucket of pups. 44/40 would pet all at once https://t.co/HppvrYuamZ 44 1202 This is Bluebert. He just saw that both #FinalFur match ups are split 50/50. Amazed af. 11/10 https://t.co/Kky1DPG4iq 50 1351 Here is a whole flock of puppers. 60/50 I'll take the lot https://t.co/9dpcw6MdWa 60 902 Why does this never happen at my front door... 165/150 https://t.co/HmwrdfEfUE 165 433 The floofs have been released I repeat the floofs have been released. 84/70 https://t.co/NIYC820tmd 84 1843 Here we have an entire platoon of puppers. Total score: 88/80 would pet all at once https://t.co/y93p6FLvVw 88 1779 IT'S PUPPERGEDDON. Total of 144/120 ...I think https://t.co/ZanVtAtvIq 144 290 @markhoppus 182/10 182 1634 Two sneaky puppers were not initially seen, moving the rating to 143/130. Please forgive us. Thank you https://t.co/kRK51Y5ac3 143 189 @s8n You tried very hard to portray this good boy as not so good, but you have ultimately failed. His goodness shines through. 666/10 666 313 @jonnysun @Lin_Manuel ok jomny I know you're excited but 960/00 isn't a valid rating, 13/10 is tho 960 979 This is Atticus. He's quite simply America af. 1776/10 https://t.co/GRXwMxLBkh 1776 55 @roushfenway These are good dogs but 17/10 is an emotional impulse rating. More like 13/10s 17 763 This is Sophie. She's a Jubilant Bush Pupper. Super h*ckin rare. Appears at random just to smile at the locals. 11.27/10 would smile back https://t.co/QFaUiIHxHq 27 1274 From left to right: Cletus, Jerome, Alejandro, Burp, & Titson None know where camera is. 45/50 would hug all at once https://t.co/sedre1ivTK 45 1228 Happy Saturday here's 9 puppers on a bench. 99/90 good work everybody https://t.co/mpvaVxKmc1 99 1635 Someone help the girl is being mugged. Several are distracting her while two steal her shoes. Clever puppers 121/110 https://t.co/1zfnTJLt55 121 1120 Say hello to this unbelievably well behaved squad of doggos. 204/170 would try to pet all at once https://t.co/yGQI3He3xv 204
# check rating_denominator value counts
archive_df.rating_denominator.value_counts()
10 2333 11 3 50 3 80 2 20 2 2 1 16 1 40 1 70 1 15 1 90 1 110 1 120 1 130 1 150 1 170 1 7 1 0 1 Name: rating_denominator, dtype: int64
# check single denominator text value
single_denominator = archive_df.rating_denominator.value_counts().index[5:]
single_denominator_index = []
for s in single_denominator:
row = archive_df.index[archive_df['rating_denominator'] == s].to_list()
single_denominator_index.append(row[0])
for s in single_denominator_index:
print(s, "\t", archive_df['text'][s], "\t",
archive_df['rating_denominator'][s])
2335 This is an Albanian 3 1/2 legged Episcopalian. Loves well-polished hardwood flooring. Penis on the collar. 9/10 https://t.co/d9NcXFKwLv 2 1663 I'm aware that I could've said 20/16, but here at WeRateDogs we are very professional. An inconsistent rating scale is simply irresponsible 16 1433 Happy Wednesday here's a bucket of pups. 44/40 would pet all at once https://t.co/HppvrYuamZ 40 433 The floofs have been released I repeat the floofs have been released. 84/70 https://t.co/NIYC820tmd 70 342 @docmisterio account started on 11/15/15 15 1228 Happy Saturday here's 9 puppers on a bench. 99/90 good work everybody https://t.co/mpvaVxKmc1 90 1635 Someone help the girl is being mugged. Several are distracting her while two steal her shoes. Clever puppers 121/110 https://t.co/1zfnTJLt55 110 1779 IT'S PUPPERGEDDON. Total of 144/120 ...I think https://t.co/ZanVtAtvIq 120 1634 Two sneaky puppers were not initially seen, moving the rating to 143/130. Please forgive us. Thank you https://t.co/kRK51Y5ac3 130 902 Why does this never happen at my front door... 165/150 https://t.co/HmwrdfEfUE 150 1120 Say hello to this unbelievably well behaved squad of doggos. 204/170 would try to pet all at once https://t.co/yGQI3He3xv 170 516 Meet Sam. She smiles 24/7 & secretly aspires to be a reindeer. Keep Sam smiling by clicking and sharing this link: https://t.co/98tB8y7y7t https://t.co/LouL5vdvxx 7 313 @jonnysun @Lin_Manuel ok jomny I know you're excited but 960/00 isn't a valid rating, 13/10 is tho 0
archive_df[archive_df.duplicated()]
| tweet_id | in_reply_to_status_id | in_reply_to_user_id | timestamp | source | text | retweeted_status_id | retweeted_status_user_id | retweeted_status_timestamp | expanded_urls | rating_numerator | rating_denominator | name | doggo | floofer | pupper | puppo |
|---|
image_df
| tweet_id | jpg_url | img_num | p1 | p1_conf | p1_dog | p2 | p2_conf | p2_dog | p3 | p3_conf | p3_dog | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 666020888022790149 | https://pbs.twimg.com/media/CT4udn0WwAA0aMy.jpg | 1 | Welsh_springer_spaniel | 0.465074 | True | collie | 0.156665 | True | Shetland_sheepdog | 0.061428 | True |
| 1 | 666029285002620928 | https://pbs.twimg.com/media/CT42GRgUYAA5iDo.jpg | 1 | redbone | 0.506826 | True | miniature_pinscher | 0.074192 | True | Rhodesian_ridgeback | 0.072010 | True |
| 2 | 666033412701032449 | https://pbs.twimg.com/media/CT4521TWwAEvMyu.jpg | 1 | German_shepherd | 0.596461 | True | malinois | 0.138584 | True | bloodhound | 0.116197 | True |
| 3 | 666044226329800704 | https://pbs.twimg.com/media/CT5Dr8HUEAA-lEu.jpg | 1 | Rhodesian_ridgeback | 0.408143 | True | redbone | 0.360687 | True | miniature_pinscher | 0.222752 | True |
| 4 | 666049248165822465 | https://pbs.twimg.com/media/CT5IQmsXIAAKY4A.jpg | 1 | miniature_pinscher | 0.560311 | True | Rottweiler | 0.243682 | True | Doberman | 0.154629 | True |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 2070 | 891327558926688256 | https://pbs.twimg.com/media/DF6hr6BUMAAzZgT.jpg | 2 | basset | 0.555712 | True | English_springer | 0.225770 | True | German_short-haired_pointer | 0.175219 | True |
| 2071 | 891689557279858688 | https://pbs.twimg.com/media/DF_q7IAWsAEuuN8.jpg | 1 | paper_towel | 0.170278 | False | Labrador_retriever | 0.168086 | True | spatula | 0.040836 | False |
| 2072 | 891815181378084864 | https://pbs.twimg.com/media/DGBdLU1WsAANxJ9.jpg | 1 | Chihuahua | 0.716012 | True | malamute | 0.078253 | True | kelpie | 0.031379 | True |
| 2073 | 892177421306343426 | https://pbs.twimg.com/media/DGGmoV4XsAAUL6n.jpg | 1 | Chihuahua | 0.323581 | True | Pekinese | 0.090647 | True | papillon | 0.068957 | True |
| 2074 | 892420643555336193 | https://pbs.twimg.com/media/DGKD1-bXoAAIAUK.jpg | 1 | orange | 0.097049 | False | bagel | 0.085851 | False | banana | 0.076110 | False |
2075 rows × 12 columns
image_df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 2075 entries, 0 to 2074 Data columns (total 12 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 tweet_id 2075 non-null int64 1 jpg_url 2075 non-null object 2 img_num 2075 non-null int64 3 p1 2075 non-null object 4 p1_conf 2075 non-null float64 5 p1_dog 2075 non-null bool 6 p2 2075 non-null object 7 p2_conf 2075 non-null float64 8 p2_dog 2075 non-null bool 9 p3 2075 non-null object 10 p3_conf 2075 non-null float64 11 p3_dog 2075 non-null bool dtypes: bool(3), float64(3), int64(2), object(4) memory usage: 152.1+ KB
image_df[image_df.jpg_url.duplicated()]
| tweet_id | jpg_url | img_num | p1 | p1_conf | p1_dog | p2 | p2_conf | p2_dog | p3 | p3_conf | p3_dog | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1297 | 752309394570878976 | https://pbs.twimg.com/ext_tw_video_thumb/67535... | 1 | upright | 0.303415 | False | golden_retriever | 0.181351 | True | Brittany_spaniel | 0.162084 | True |
| 1315 | 754874841593970688 | https://pbs.twimg.com/media/CWza7kpWcAAdYLc.jpg | 1 | pug | 0.272205 | True | bull_mastiff | 0.251530 | True | bath_towel | 0.116806 | False |
| 1333 | 757729163776290825 | https://pbs.twimg.com/media/CWyD2HGUYAQ1Xa7.jpg | 2 | cash_machine | 0.802333 | False | schipperke | 0.045519 | True | German_shepherd | 0.023353 | True |
| 1345 | 759159934323924993 | https://pbs.twimg.com/media/CU1zsMSUAAAS0qW.jpg | 1 | Irish_terrier | 0.254856 | True | briard | 0.227716 | True | soft-coated_wheaten_terrier | 0.223263 | True |
| 1349 | 759566828574212096 | https://pbs.twimg.com/media/CkNjahBXAAQ2kWo.jpg | 1 | Labrador_retriever | 0.967397 | True | golden_retriever | 0.016641 | True | ice_bear | 0.014858 | False |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 1903 | 851953902622658560 | https://pbs.twimg.com/media/C4KHj-nWQAA3poV.jpg | 1 | Staffordshire_bullterrier | 0.757547 | True | American_Staffordshire_terrier | 0.149950 | True | Chesapeake_Bay_retriever | 0.047523 | True |
| 1944 | 861769973181624320 | https://pbs.twimg.com/media/CzG425nWgAAnP7P.jpg | 2 | Arabian_camel | 0.366248 | False | house_finch | 0.209852 | False | cocker_spaniel | 0.046403 | True |
| 1992 | 873697596434513921 | https://pbs.twimg.com/media/DA7iHL5U0AA1OQo.jpg | 1 | laptop | 0.153718 | False | French_bulldog | 0.099984 | True | printer | 0.077130 | False |
| 2041 | 885311592912609280 | https://pbs.twimg.com/media/C4bTH6nWMAAX_bJ.jpg | 1 | Labrador_retriever | 0.908703 | True | seat_belt | 0.057091 | False | pug | 0.011933 | True |
| 2055 | 888202515573088257 | https://pbs.twimg.com/media/DFDw2tyUQAAAFke.jpg | 2 | Pembroke | 0.809197 | True | Rhodesian_ridgeback | 0.054950 | True | beagle | 0.038915 | True |
66 rows × 12 columns
tweepy_df
| created_at | id | id_str | full_text | truncated | display_text_range | entities | extended_entities | source | in_reply_to_status_id | ... | favorited | retweeted | possibly_sensitive | possibly_sensitive_appealable | lang | retweeted_status | quoted_status_id | quoted_status_id_str | quoted_status_permalink | quoted_status | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2017-08-01 16:23:56+00:00 | 892420643555336193 | 892420643555336192 | This is Phineas. He's a mystical boy. Only eve... | False | [0, 85] | {'hashtags': [], 'symbols': [], 'user_mentions... | {'media': [{'id': 892420639486877696, 'id_str'... | <a href="http://twitter.com/download/iphone" r... | NaN | ... | False | False | 0.0 | 0.0 | en | NaN | NaN | NaN | NaN | NaN |
| 1 | 2017-08-01 00:17:27+00:00 | 892177421306343426 | 892177421306343424 | This is Tilly. She's just checking pup on you.... | False | [0, 138] | {'hashtags': [], 'symbols': [], 'user_mentions... | {'media': [{'id': 892177413194625024, 'id_str'... | <a href="http://twitter.com/download/iphone" r... | NaN | ... | False | False | 0.0 | 0.0 | en | NaN | NaN | NaN | NaN | NaN |
| 2 | 2017-07-31 00:18:03+00:00 | 891815181378084864 | 891815181378084864 | This is Archie. He is a rare Norwegian Pouncin... | False | [0, 121] | {'hashtags': [], 'symbols': [], 'user_mentions... | {'media': [{'id': 891815175371796480, 'id_str'... | <a href="http://twitter.com/download/iphone" r... | NaN | ... | False | False | 0.0 | 0.0 | en | NaN | NaN | NaN | NaN | NaN |
| 3 | 2017-07-30 15:58:51+00:00 | 891689557279858688 | 891689557279858688 | This is Darla. She commenced a snooze mid meal... | False | [0, 79] | {'hashtags': [], 'symbols': [], 'user_mentions... | {'media': [{'id': 891689552724799489, 'id_str'... | <a href="http://twitter.com/download/iphone" r... | NaN | ... | False | False | 0.0 | 0.0 | en | NaN | NaN | NaN | NaN | NaN |
| 4 | 2017-07-29 16:00:24+00:00 | 891327558926688256 | 891327558926688256 | This is Franklin. He would like you to stop ca... | False | [0, 138] | {'hashtags': [{'text': 'BarkWeek', 'indices': ... | {'media': [{'id': 891327551943041024, 'id_str'... | <a href="http://twitter.com/download/iphone" r... | NaN | ... | False | False | 0.0 | 0.0 | en | NaN | NaN | NaN | NaN | NaN |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 2317 | 2015-11-16 00:24:50+00:00 | 666049248165822465 | 666049248165822464 | Here we have a 1949 1st generation vulpix. Enj... | False | [0, 120] | {'hashtags': [], 'symbols': [], 'user_mentions... | {'media': [{'id': 666049244999131136, 'id_str'... | <a href="http://twitter.com/download/iphone" r... | NaN | ... | False | False | 0.0 | 0.0 | en | NaN | NaN | NaN | NaN | NaN |
| 2318 | 2015-11-16 00:04:52+00:00 | 666044226329800704 | 666044226329800704 | This is a purebred Piers Morgan. Loves to Netf... | False | [0, 137] | {'hashtags': [], 'symbols': [], 'user_mentions... | {'media': [{'id': 666044217047650304, 'id_str'... | <a href="http://twitter.com/download/iphone" r... | NaN | ... | False | False | 0.0 | 0.0 | en | NaN | NaN | NaN | NaN | NaN |
| 2319 | 2015-11-15 23:21:54+00:00 | 666033412701032449 | 666033412701032448 | Here is a very happy pup. Big fan of well-main... | False | [0, 130] | {'hashtags': [], 'symbols': [], 'user_mentions... | {'media': [{'id': 666033409081393153, 'id_str'... | <a href="http://twitter.com/download/iphone" r... | NaN | ... | False | False | 0.0 | 0.0 | en | NaN | NaN | NaN | NaN | NaN |
| 2320 | 2015-11-15 23:05:30+00:00 | 666029285002620928 | 666029285002620928 | This is a western brown Mitsubishi terrier. Up... | False | [0, 139] | {'hashtags': [], 'symbols': [], 'user_mentions... | {'media': [{'id': 666029276303482880, 'id_str'... | <a href="http://twitter.com/download/iphone" r... | NaN | ... | False | False | 0.0 | 0.0 | en | NaN | NaN | NaN | NaN | NaN |
| 2321 | 2015-11-15 22:32:08+00:00 | 666020888022790149 | 666020888022790144 | Here we have a Japanese Irish Setter. Lost eye... | False | [0, 131] | {'hashtags': [], 'symbols': [], 'user_mentions... | {'media': [{'id': 666020881337073664, 'id_str'... | <a href="http://twitter.com/download/iphone" r... | NaN | ... | False | False | 0.0 | 0.0 | en | NaN | NaN | NaN | NaN | NaN |
2322 rows × 32 columns
tweepy_df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 2322 entries, 0 to 2321 Data columns (total 32 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 created_at 2322 non-null datetime64[ns, UTC] 1 id 2322 non-null int64 2 id_str 2322 non-null int64 3 full_text 2322 non-null object 4 truncated 2322 non-null bool 5 display_text_range 2322 non-null object 6 entities 2322 non-null object 7 extended_entities 2050 non-null object 8 source 2322 non-null object 9 in_reply_to_status_id 76 non-null float64 10 in_reply_to_status_id_str 76 non-null float64 11 in_reply_to_user_id 76 non-null float64 12 in_reply_to_user_id_str 76 non-null float64 13 in_reply_to_screen_name 76 non-null object 14 user 2322 non-null object 15 geo 0 non-null float64 16 coordinates 0 non-null float64 17 place 1 non-null object 18 contributors 0 non-null float64 19 is_quote_status 2322 non-null bool 20 retweet_count 2322 non-null int64 21 favorite_count 2322 non-null int64 22 favorited 2322 non-null bool 23 retweeted 2322 non-null bool 24 possibly_sensitive 2187 non-null float64 25 possibly_sensitive_appealable 2187 non-null float64 26 lang 2322 non-null object 27 retweeted_status 162 non-null object 28 quoted_status_id 26 non-null float64 29 quoted_status_id_str 26 non-null float64 30 quoted_status_permalink 26 non-null object 31 quoted_status 24 non-null object dtypes: bool(4), datetime64[ns, UTC](1), float64(11), int64(4), object(12) memory usage: 517.1+ KB
tweepy_df['retweeted_status'].value_counts()
{'created_at': 'Sat Jul 15 02:44:07 +0000 2017', 'id': 886053734421102592, 'id_str': '886053734421102592', 'full_text': '12/10 #BATP https://t.co/WxwJmvjfxo', 'truncated': False, 'display_text_range': [0, 11], 'entities': {'hashtags': [{'text': 'BATP', 'indices': [6, 11]}], 'symbols': [], 'user_mentions': [], 'urls': [{'url': 'https://t.co/WxwJmvjfxo', 'expanded_url': 'https://twitter.com/dog_rates/status/886053434075471873', 'display_url': 'twitter.com/dog_rates/stat…', 'indices': [12, 35]}]}, 'source': '<a href="http://twitter.com" rel="nofollow">Twitter Web Client</a>', 'in_reply_to_status_id': None, 'in_reply_to_status_id_str': None, 'in_reply_to_user_id': None, 'in_reply_to_user_id_str': None, 'in_reply_to_screen_name': None, 'user': {'id': 19607400, 'id_str': '19607400', 'name': 'Oakland A's', 'screen_name': 'Athletics', 'location': 'Oakland, CA', 'description': 'Official Twitter of the nine-time World Series champion Athletics | #RootedInOakland | Instagram: @athletics | Snapchat: athletics', 'url': 'https://t.co/r4DoRNY1zr', 'entities': {'url': {'urls': [{'url': 'https://t.co/r4DoRNY1zr', 'expanded_url': 'http://www.athletics.com', 'display_url': 'athletics.com', 'indices': [0, 23]}]}, 'description': {'urls': []}}, 'protected': False, 'followers_count': 565555, 'friends_count': 542, 'listed_count': 5162, 'created_at': 'Tue Jan 27 18:40:21 +0000 2009', 'favourites_count': 27445, 'utc_offset': None, 'time_zone': None, 'geo_enabled': True, 'verified': True, 'statuses_count': 57978, 'lang': None, 'contributors_enabled': False, 'is_translator': False, 'is_translation_enabled': False, 'profile_background_color': 'FCB514', 'profile_background_image_url': 'http://abs.twimg.com/images/themes/theme1/bg.png', 'profile_background_image_url_https': 'https://abs.twimg.com/images/themes/theme1/bg.png', 'profile_background_tile': False, 'profile_image_url': 'http://pbs.twimg.com/profile_images/1286704475059531777/dGrbr0eo_normal.jpg', 'profile_image_url_https': 'https://pbs.twimg.com/profile_images/1286704475059531777/dGrbr0eo_normal.jpg', 'profile_banner_url': 'https://pbs.twimg.com/profile_banners/19607400/1595792133', 'profile_link_color': '2B463A', 'profile_sidebar_border_color': '000000', 'profile_sidebar_fill_color': '7BD193', 'profile_text_color': '333333', 'profile_use_background_image': False, 'has_extended_profile': False, 'default_profile': False, 'default_profile_image': False, 'following': False, 'follow_request_sent': False, 'notifications': False, 'translator_type': 'none'}, 'geo': None, 'coordinates': None, 'place': None, 'contributors': None, 'is_quote_status': True, 'quoted_status_id': 886053434075471873, 'quoted_status_id_str': '886053434075471873', 'quoted_status_permalink': {'url': 'https://t.co/WxwJmvjfxo', 'expanded': 'https://twitter.com/dog_rates/status/886053434075471873', 'display': 'twitter.com/dog_rates/stat…'}, 'quoted_status': {'created_at': 'Sat Jul 15 02:42:55 +0000 2017', 'id': 886053434075471873, 'id_str': '886053434075471873', 'full_text': 'Our snapchat story is h*ckin ridiculous right now. The @Athletics really know how to host a Bark at the Park
https://t.co/gJx2GpMSyY https://t.co/6d2N0ctyC1', 'truncated': False, 'display_text_range': [0, 132], 'entities': {'hashtags': [], 'symbols': [], 'user_mentions': [{'screen_name': 'Athletics', 'name': "Oakland A's", 'id': 19607400, 'id_str': '19607400', 'indices': [55, 65]}], 'urls': [{'url': 'https://t.co/gJx2GpMSyY', 'expanded_url': 'https://www.snapchat.com/add/weratedogs', 'display_url': 'snapchat.com/add/weratedogs', 'indices': [109, 132]}], 'media': [{'id': 886053427184254976, 'id_str': '886053427184254976', 'indices': [133, 156], 'media_url': 'http://pbs.twimg.com/media/DEvk5cNVwAAcISQ.jpg', 'media_url_https': 'https://pbs.twimg.com/media/DEvk5cNVwAAcISQ.jpg', 'url': 'https://t.co/6d2N0ctyC1', 'display_url': 'pic.twitter.com/6d2N0ctyC1', 'expanded_url': 'https://twitter.com/dog_rates/status/886053434075471873/photo/1', 'type': 'photo', 'sizes': {'thumb': {'w': 150, 'h': 150, 'resize': 'crop'}, 'large': {'w': 750, 'h': 1334, 'resize': 'fit'}, 'small': {'w': 382, 'h': 680, 'resize': 'fit'}, 'medium': {'w': 675, 'h': 1200, 'resize': 'fit'}}}]}, 'extended_entities': {'media': [{'id': 886053427184254976, 'id_str': '886053427184254976', 'indices': [133, 156], 'media_url': 'http://pbs.twimg.com/media/DEvk5cNVwAAcISQ.jpg', 'media_url_https': 'https://pbs.twimg.com/media/DEvk5cNVwAAcISQ.jpg', 'url': 'https://t.co/6d2N0ctyC1', 'display_url': 'pic.twitter.com/6d2N0ctyC1', 'expanded_url': 'https://twitter.com/dog_rates/status/886053434075471873/photo/1', 'type': 'photo', 'sizes': {'thumb': {'w': 150, 'h': 150, 'resize': 'crop'}, 'large': {'w': 750, 'h': 1334, 'resize': 'fit'}, 'small': {'w': 382, 'h': 680, 'resize': 'fit'}, 'medium': {'w': 675, 'h': 1200, 'resize': 'fit'}}}]}, 'source': '<a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>', 'in_reply_to_status_id': None, 'in_reply_to_status_id_str': None, 'in_reply_to_user_id': None, 'in_reply_to_user_id_str': None, 'in_reply_to_screen_name': None, 'user': {'id': 4196983835, 'id_str': '4196983835', 'name': 'WeRateDogs®', 'screen_name': 'dog_rates', 'location': '「 DM YOUR DOGS 」', 'description': 'Your Only Source For Professional Dog Ratings Instagram and Facebook ➪ WeRateDogs partnerships@weratedogs.com ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀', 'url': 'https://t.co/Wrvtpnv7JV', 'entities': {'url': {'urls': [{'url': 'https://t.co/Wrvtpnv7JV', 'expanded_url': 'https://blacklivesmatters.carrd.co', 'display_url': 'blacklivesmatters.carrd.co', 'indices': [0, 23]}]}, 'description': {'urls': []}}, 'protected': False, 'followers_count': 8815727, 'friends_count': 17, 'listed_count': 5695, 'created_at': 'Sun Nov 15 21:41:29 +0000 2015', 'favourites_count': 145866, 'utc_offset': None, 'time_zone': None, 'geo_enabled': True, 'verified': True, 'statuses_count': 12552, 'lang': None, 'contributors_enabled': False, 'is_translator': False, 'is_translation_enabled': False, 'profile_background_color': '000000', 'profile_background_image_url': 'http://abs.twimg.com/images/themes/theme1/bg.png', 'profile_background_image_url_https': 'https://abs.twimg.com/images/themes/theme1/bg.png', 'profile_background_tile': False, 'profile_image_url': 'http://pbs.twimg.com/profile_images/1267972589722296320/XBr04M6J_normal.jpg', 'profile_image_url_https': 'https://pbs.twimg.com/profile_images/1267972589722296320/XBr04M6J_normal.jpg', 'profile_banner_url': 'https://pbs.twimg.com/profile_banners/4196983835/1591077312', 'profile_link_color': 'F5ABB5', 'profile_sidebar_border_color': '000000', 'profile_sidebar_fill_color': '000000', 'profile_text_color': '000000', 'profile_use_background_image': False, 'has_extended_profile': False, 'default_profile': False, 'default_profile_image': False, 'following': False, 'follow_request_sent': False, 'notifications': False, 'translator_type': 'none'}, 'geo': None, 'coordinates': None, 'place': None, 'contributors': None, 'is_quote_status': False, 'retweet_count': 190, 'favorite_count': 3064, 'favorited': False, 'retweeted': False, 'possibly_sensitive': False, 'possibly_sensitive_appealable': False, 'lang': 'en'}, 'retweet_count': 100, 'favorite_count': 1442, 'favorited': False, 'retweeted': False, 'possibly_sensitive': False, 'possibly_sensitive_appealable': False, 'lang': 'und'} 1
{'created_at': 'Sat May 28 03:04:00 +0000 2016', 'id': 736392552031657984, 'id_str': '736392552031657984', 'full_text': 'Say hello to mad pupper. You know what you did. 13/10 would pet until no longer furustrated https://t.co/u1ulQ5heLX', 'truncated': False, 'display_text_range': [0, 115], 'entities': {'hashtags': [], 'symbols': [], 'user_mentions': [], 'urls': [{'url': 'https://t.co/u1ulQ5heLX', 'expanded_url': 'https://vine.co/v/iEggaEOiLO3', 'display_url': 'vine.co/v/iEggaEOiLO3', 'indices': [92, 115]}]}, 'source': '<a href="http://vine.co" rel="nofollow">Vine - Make a Scene</a>', 'in_reply_to_status_id': None, 'in_reply_to_status_id_str': None, 'in_reply_to_user_id': None, 'in_reply_to_user_id_str': None, 'in_reply_to_screen_name': None, 'user': {'id': 4196983835, 'id_str': '4196983835', 'name': 'WeRateDogs®', 'screen_name': 'dog_rates', 'location': '「 DM YOUR DOGS 」', 'description': 'Your Only Source For Professional Dog Ratings Instagram and Facebook ➪ WeRateDogs partnerships@weratedogs.com ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀', 'url': 'https://t.co/Wrvtpnv7JV', 'entities': {'url': {'urls': [{'url': 'https://t.co/Wrvtpnv7JV', 'expanded_url': 'https://blacklivesmatters.carrd.co', 'display_url': 'blacklivesmatters.carrd.co', 'indices': [0, 23]}]}, 'description': {'urls': []}}, 'protected': False, 'followers_count': 8815742, 'friends_count': 17, 'listed_count': 5695, 'created_at': 'Sun Nov 15 21:41:29 +0000 2015', 'favourites_count': 145866, 'utc_offset': None, 'time_zone': None, 'geo_enabled': True, 'verified': True, 'statuses_count': 12552, 'lang': None, 'contributors_enabled': False, 'is_translator': False, 'is_translation_enabled': False, 'profile_background_color': '000000', 'profile_background_image_url': 'http://abs.twimg.com/images/themes/theme1/bg.png', 'profile_background_image_url_https': 'https://abs.twimg.com/images/themes/theme1/bg.png', 'profile_background_tile': False, 'profile_image_url': 'http://pbs.twimg.com/profile_images/1267972589722296320/XBr04M6J_normal.jpg', 'profile_image_url_https': 'https://pbs.twimg.com/profile_images/1267972589722296320/XBr04M6J_normal.jpg', 'profile_banner_url': 'https://pbs.twimg.com/profile_banners/4196983835/1591077312', 'profile_link_color': 'F5ABB5', 'profile_sidebar_border_color': '000000', 'profile_sidebar_fill_color': '000000', 'profile_text_color': '000000', 'profile_use_background_image': False, 'has_extended_profile': False, 'default_profile': False, 'default_profile_image': False, 'following': False, 'follow_request_sent': False, 'notifications': False, 'translator_type': 'none'}, 'geo': None, 'coordinates': None, 'place': None, 'contributors': None, 'is_quote_status': False, 'retweet_count': 7251, 'favorite_count': 17450, 'favorited': False, 'retweeted': False, 'possibly_sensitive': False, 'possibly_sensitive_appealable': False, 'lang': 'en'} 1
{'created_at': 'Tue Sep 13 16:30:07 +0000 2016', 'id': 775733305207554048, 'id_str': '775733305207554048', 'full_text': 'This is Anakin. He strives to reach his full doggo potential. Born with blurry tail tho. 11/10 would still pet well https://t.co/9CcBSxCXXG', 'truncated': False, 'display_text_range': [0, 115], 'entities': {'hashtags': [], 'symbols': [], 'user_mentions': [], 'urls': [], 'media': [{'id': 775733297511067649, 'id_str': '775733297511067649', 'indices': [116, 139], 'media_url': 'http://pbs.twimg.com/media/CsP1UvaW8AExVSA.jpg', 'media_url_https': 'https://pbs.twimg.com/media/CsP1UvaW8AExVSA.jpg', 'url': 'https://t.co/9CcBSxCXXG', 'display_url': 'pic.twitter.com/9CcBSxCXXG', 'expanded_url': 'https://twitter.com/dog_rates/status/775733305207554048/photo/1', 'type': 'photo', 'sizes': {'large': {'w': 600, 'h': 600, 'resize': 'fit'}, 'thumb': {'w': 150, 'h': 150, 'resize': 'crop'}, 'medium': {'w': 600, 'h': 600, 'resize': 'fit'}, 'small': {'w': 600, 'h': 600, 'resize': 'fit'}}}]}, 'extended_entities': {'media': [{'id': 775733297511067649, 'id_str': '775733297511067649', 'indices': [116, 139], 'media_url': 'http://pbs.twimg.com/media/CsP1UvaW8AExVSA.jpg', 'media_url_https': 'https://pbs.twimg.com/media/CsP1UvaW8AExVSA.jpg', 'url': 'https://t.co/9CcBSxCXXG', 'display_url': 'pic.twitter.com/9CcBSxCXXG', 'expanded_url': 'https://twitter.com/dog_rates/status/775733305207554048/photo/1', 'type': 'photo', 'sizes': {'large': {'w': 600, 'h': 600, 'resize': 'fit'}, 'thumb': {'w': 150, 'h': 150, 'resize': 'crop'}, 'medium': {'w': 600, 'h': 600, 'resize': 'fit'}, 'small': {'w': 600, 'h': 600, 'resize': 'fit'}}}]}, 'source': '<a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>', 'in_reply_to_status_id': None, 'in_reply_to_status_id_str': None, 'in_reply_to_user_id': None, 'in_reply_to_user_id_str': None, 'in_reply_to_screen_name': None, 'user': {'id': 4196983835, 'id_str': '4196983835', 'name': 'WeRateDogs®', 'screen_name': 'dog_rates', 'location': '「 DM YOUR DOGS 」', 'description': 'Your Only Source For Professional Dog Ratings Instagram and Facebook ➪ WeRateDogs partnerships@weratedogs.com ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀', 'url': 'https://t.co/Wrvtpnv7JV', 'entities': {'url': {'urls': [{'url': 'https://t.co/Wrvtpnv7JV', 'expanded_url': 'https://blacklivesmatters.carrd.co', 'display_url': 'blacklivesmatters.carrd.co', 'indices': [0, 23]}]}, 'description': {'urls': []}}, 'protected': False, 'followers_count': 8815740, 'friends_count': 17, 'listed_count': 5695, 'created_at': 'Sun Nov 15 21:41:29 +0000 2015', 'favourites_count': 145866, 'utc_offset': None, 'time_zone': None, 'geo_enabled': True, 'verified': True, 'statuses_count': 12552, 'lang': None, 'contributors_enabled': False, 'is_translator': False, 'is_translation_enabled': False, 'profile_background_color': '000000', 'profile_background_image_url': 'http://abs.twimg.com/images/themes/theme1/bg.png', 'profile_background_image_url_https': 'https://abs.twimg.com/images/themes/theme1/bg.png', 'profile_background_tile': False, 'profile_image_url': 'http://pbs.twimg.com/profile_images/1267972589722296320/XBr04M6J_normal.jpg', 'profile_image_url_https': 'https://pbs.twimg.com/profile_images/1267972589722296320/XBr04M6J_normal.jpg', 'profile_banner_url': 'https://pbs.twimg.com/profile_banners/4196983835/1591077312', 'profile_link_color': 'F5ABB5', 'profile_sidebar_border_color': '000000', 'profile_sidebar_fill_color': '000000', 'profile_text_color': '000000', 'profile_use_background_image': False, 'has_extended_profile': False, 'default_profile': False, 'default_profile_image': False, 'following': False, 'follow_request_sent': False, 'notifications': False, 'translator_type': 'none'}, 'geo': None, 'coordinates': None, 'place': None, 'contributors': None, 'is_quote_status': False, 'retweet_count': 3998, 'favorite_count': 13921, 'favorited': False, 'retweeted': False, 'possibly_sensitive': False, 'possibly_sensitive_appealable': False, 'lang': 'en'} 1
{'created_at': 'Thu Nov 19 00:32:12 +0000 2015', 'id': 667138269671505920, 'id_str': '667138269671505920', 'full_text': 'Extremely intelligent dog here. Has learned to walk like human. Even has his own dog. Very impressive 10/10 https://t.co/0DvHAMdA4V', 'truncated': False, 'display_text_range': [0, 131], 'entities': {'hashtags': [], 'symbols': [], 'user_mentions': [], 'urls': [], 'media': [{'id': 667138263048585216, 'id_str': '667138263048585216', 'indices': [108, 131], 'media_url': 'http://pbs.twimg.com/media/CUImtzEVAAAZNJo.jpg', 'media_url_https': 'https://pbs.twimg.com/media/CUImtzEVAAAZNJo.jpg', 'url': 'https://t.co/0DvHAMdA4V', 'display_url': 'pic.twitter.com/0DvHAMdA4V', 'expanded_url': 'https://twitter.com/dog_rates/status/667138269671505920/photo/1', 'type': 'photo', 'sizes': {'thumb': {'w': 150, 'h': 150, 'resize': 'crop'}, 'large': {'w': 1024, 'h': 862, 'resize': 'fit'}, 'small': {'w': 680, 'h': 572, 'resize': 'fit'}, 'medium': {'w': 1024, 'h': 862, 'resize': 'fit'}}}]}, 'extended_entities': {'media': [{'id': 667138263048585216, 'id_str': '667138263048585216', 'indices': [108, 131], 'media_url': 'http://pbs.twimg.com/media/CUImtzEVAAAZNJo.jpg', 'media_url_https': 'https://pbs.twimg.com/media/CUImtzEVAAAZNJo.jpg', 'url': 'https://t.co/0DvHAMdA4V', 'display_url': 'pic.twitter.com/0DvHAMdA4V', 'expanded_url': 'https://twitter.com/dog_rates/status/667138269671505920/photo/1', 'type': 'photo', 'sizes': {'thumb': {'w': 150, 'h': 150, 'resize': 'crop'}, 'large': {'w': 1024, 'h': 862, 'resize': 'fit'}, 'small': {'w': 680, 'h': 572, 'resize': 'fit'}, 'medium': {'w': 1024, 'h': 862, 'resize': 'fit'}}}]}, 'source': '<a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>', 'in_reply_to_status_id': None, 'in_reply_to_status_id_str': None, 'in_reply_to_user_id': None, 'in_reply_to_user_id_str': None, 'in_reply_to_screen_name': None, 'user': {'id': 4196983835, 'id_str': '4196983835', 'name': 'WeRateDogs®', 'screen_name': 'dog_rates', 'location': '「 DM YOUR DOGS 」', 'description': 'Your Only Source For Professional Dog Ratings Instagram and Facebook ➪ WeRateDogs partnerships@weratedogs.com ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀', 'url': 'https://t.co/Wrvtpnv7JV', 'entities': {'url': {'urls': [{'url': 'https://t.co/Wrvtpnv7JV', 'expanded_url': 'https://blacklivesmatters.carrd.co', 'display_url': 'blacklivesmatters.carrd.co', 'indices': [0, 23]}]}, 'description': {'urls': []}}, 'protected': False, 'followers_count': 8815745, 'friends_count': 17, 'listed_count': 5696, 'created_at': 'Sun Nov 15 21:41:29 +0000 2015', 'favourites_count': 145866, 'utc_offset': None, 'time_zone': None, 'geo_enabled': True, 'verified': True, 'statuses_count': 12552, 'lang': None, 'contributors_enabled': False, 'is_translator': False, 'is_translation_enabled': False, 'profile_background_color': '000000', 'profile_background_image_url': 'http://abs.twimg.com/images/themes/theme1/bg.png', 'profile_background_image_url_https': 'https://abs.twimg.com/images/themes/theme1/bg.png', 'profile_background_tile': False, 'profile_image_url': 'http://pbs.twimg.com/profile_images/1267972589722296320/XBr04M6J_normal.jpg', 'profile_image_url_https': 'https://pbs.twimg.com/profile_images/1267972589722296320/XBr04M6J_normal.jpg', 'profile_banner_url': 'https://pbs.twimg.com/profile_banners/4196983835/1591077312', 'profile_link_color': 'F5ABB5', 'profile_sidebar_border_color': '000000', 'profile_sidebar_fill_color': '000000', 'profile_text_color': '000000', 'profile_use_background_image': False, 'has_extended_profile': False, 'default_profile': False, 'default_profile_image': False, 'following': False, 'follow_request_sent': False, 'notifications': False, 'translator_type': 'none'}, 'geo': None, 'coordinates': None, 'place': None, 'contributors': None, 'is_quote_status': False, 'retweet_count': 2044, 'favorite_count': 4332, 'favorited': False, 'retweeted': False, 'possibly_sensitive': False, 'possibly_sensitive_appealable': False, 'lang': 'en'} 1
{'created_at': 'Sat Dec 17 00:38:52 +0000 2016', 'id': 809920764300447744, 'id_str': '809920764300447744', 'full_text': 'Please only send in dogs. We only rate dogs, not seemingly heartbroken ewoks. Thank you... still 10/10 would console https://t.co/HIraYS1Bzo', 'truncated': False, 'display_text_range': [0, 116], 'entities': {'hashtags': [], 'symbols': [], 'user_mentions': [], 'urls': [], 'media': [{'id': 809920757623115780, 'id_str': '809920757623115780', 'indices': [117, 140], 'media_url': 'http://pbs.twimg.com/media/Cz1qo05XUAQ4qXp.jpg', 'media_url_https': 'https://pbs.twimg.com/media/Cz1qo05XUAQ4qXp.jpg', 'url': 'https://t.co/HIraYS1Bzo', 'display_url': 'pic.twitter.com/HIraYS1Bzo', 'expanded_url': 'https://twitter.com/dog_rates/status/809920764300447744/photo/1', 'type': 'photo', 'sizes': {'thumb': {'w': 150, 'h': 150, 'resize': 'crop'}, 'small': {'w': 491, 'h': 680, 'resize': 'fit'}, 'medium': {'w': 867, 'h': 1200, 'resize': 'fit'}, 'large': {'w': 1149, 'h': 1590, 'resize': 'fit'}}}]}, 'extended_entities': {'media': [{'id': 809920757623115780, 'id_str': '809920757623115780', 'indices': [117, 140], 'media_url': 'http://pbs.twimg.com/media/Cz1qo05XUAQ4qXp.jpg', 'media_url_https': 'https://pbs.twimg.com/media/Cz1qo05XUAQ4qXp.jpg', 'url': 'https://t.co/HIraYS1Bzo', 'display_url': 'pic.twitter.com/HIraYS1Bzo', 'expanded_url': 'https://twitter.com/dog_rates/status/809920764300447744/photo/1', 'type': 'photo', 'sizes': {'thumb': {'w': 150, 'h': 150, 'resize': 'crop'}, 'small': {'w': 491, 'h': 680, 'resize': 'fit'}, 'medium': {'w': 867, 'h': 1200, 'resize': 'fit'}, 'large': {'w': 1149, 'h': 1590, 'resize': 'fit'}}}]}, 'source': '<a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>', 'in_reply_to_status_id': None, 'in_reply_to_status_id_str': None, 'in_reply_to_user_id': None, 'in_reply_to_user_id_str': None, 'in_reply_to_screen_name': None, 'user': {'id': 4196983835, 'id_str': '4196983835', 'name': 'WeRateDogs®', 'screen_name': 'dog_rates', 'location': '「 DM YOUR DOGS 」', 'description': 'Your Only Source For Professional Dog Ratings Instagram and Facebook ➪ WeRateDogs partnerships@weratedogs.com ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀', 'url': 'https://t.co/Wrvtpnv7JV', 'entities': {'url': {'urls': [{'url': 'https://t.co/Wrvtpnv7JV', 'expanded_url': 'https://blacklivesmatters.carrd.co', 'display_url': 'blacklivesmatters.carrd.co', 'indices': [0, 23]}]}, 'description': {'urls': []}}, 'protected': False, 'followers_count': 8815731, 'friends_count': 17, 'listed_count': 5695, 'created_at': 'Sun Nov 15 21:41:29 +0000 2015', 'favourites_count': 145866, 'utc_offset': None, 'time_zone': None, 'geo_enabled': True, 'verified': True, 'statuses_count': 12552, 'lang': None, 'contributors_enabled': False, 'is_translator': False, 'is_translation_enabled': False, 'profile_background_color': '000000', 'profile_background_image_url': 'http://abs.twimg.com/images/themes/theme1/bg.png', 'profile_background_image_url_https': 'https://abs.twimg.com/images/themes/theme1/bg.png', 'profile_background_tile': False, 'profile_image_url': 'http://pbs.twimg.com/profile_images/1267972589722296320/XBr04M6J_normal.jpg', 'profile_image_url_https': 'https://pbs.twimg.com/profile_images/1267972589722296320/XBr04M6J_normal.jpg', 'profile_banner_url': 'https://pbs.twimg.com/profile_banners/4196983835/1591077312', 'profile_link_color': 'F5ABB5', 'profile_sidebar_border_color': '000000', 'profile_sidebar_fill_color': '000000', 'profile_text_color': '000000', 'profile_use_background_image': False, 'has_extended_profile': False, 'default_profile': False, 'default_profile_image': False, 'following': False, 'follow_request_sent': False, 'notifications': False, 'translator_type': 'none'}, 'geo': None, 'coordinates': None, 'place': None, 'contributors': None, 'is_quote_status': False, 'retweet_count': 3982, 'favorite_count': 15699, 'favorited': False, 'retweeted': False, 'possibly_sensitive': False, 'possibly_sensitive_appealable': False, 'lang': 'en'} 1
..
{'created_at': 'Sun Feb 19 01:23:00 +0000 2017', 'id': 833124694597443584, 'id_str': '833124694597443584', 'full_text': 'This is Gidget. She's a spy pupper. Stealthy as h*ck. Must've slipped pup and got caught. 12/10 would forgive then pet https://t.co/zD97KYFaFa', 'truncated': False, 'display_text_range': [0, 118], 'entities': {'hashtags': [], 'symbols': [], 'user_mentions': [], 'urls': [], 'media': [{'id': 833124662091542528, 'id_str': '833124662091542528', 'indices': [119, 142], 'media_url': 'http://pbs.twimg.com/media/C4_ad1GVcAAgvx6.jpg', 'media_url_https': 'https://pbs.twimg.com/media/C4_ad1GVcAAgvx6.jpg', 'url': 'https://t.co/zD97KYFaFa', 'display_url': 'pic.twitter.com/zD97KYFaFa', 'expanded_url': 'https://twitter.com/dog_rates/status/833124694597443584/photo/1', 'type': 'photo', 'sizes': {'thumb': {'w': 150, 'h': 150, 'resize': 'crop'}, 'medium': {'w': 675, 'h': 1200, 'resize': 'fit'}, 'small': {'w': 383, 'h': 680, 'resize': 'fit'}, 'large': {'w': 1152, 'h': 2048, 'resize': 'fit'}}}]}, 'extended_entities': {'media': [{'id': 833124662091542528, 'id_str': '833124662091542528', 'indices': [119, 142], 'media_url': 'http://pbs.twimg.com/media/C4_ad1GVcAAgvx6.jpg', 'media_url_https': 'https://pbs.twimg.com/media/C4_ad1GVcAAgvx6.jpg', 'url': 'https://t.co/zD97KYFaFa', 'display_url': 'pic.twitter.com/zD97KYFaFa', 'expanded_url': 'https://twitter.com/dog_rates/status/833124694597443584/photo/1', 'type': 'photo', 'sizes': {'thumb': {'w': 150, 'h': 150, 'resize': 'crop'}, 'medium': {'w': 675, 'h': 1200, 'resize': 'fit'}, 'small': {'w': 383, 'h': 680, 'resize': 'fit'}, 'large': {'w': 1152, 'h': 2048, 'resize': 'fit'}}}, {'id': 833124662095679488, 'id_str': '833124662095679488', 'indices': [119, 142], 'media_url': 'http://pbs.twimg.com/media/C4_ad1HUkAAWbJp.jpg', 'media_url_https': 'https://pbs.twimg.com/media/C4_ad1HUkAAWbJp.jpg', 'url': 'https://t.co/zD97KYFaFa', 'display_url': 'pic.twitter.com/zD97KYFaFa', 'expanded_url': 'https://twitter.com/dog_rates/status/833124694597443584/photo/1', 'type': 'photo', 'sizes': {'thumb': {'w': 150, 'h': 150, 'resize': 'crop'}, 'medium': {'w': 675, 'h': 1200, 'resize': 'fit'}, 'small': {'w': 383, 'h': 680, 'resize': 'fit'}, 'large': {'w': 1152, 'h': 2048, 'resize': 'fit'}}}, {'id': 833124662099877889, 'id_str': '833124662099877889', 'indices': [119, 142], 'media_url': 'http://pbs.twimg.com/media/C4_ad1IUoAEspsk.jpg', 'media_url_https': 'https://pbs.twimg.com/media/C4_ad1IUoAEspsk.jpg', 'url': 'https://t.co/zD97KYFaFa', 'display_url': 'pic.twitter.com/zD97KYFaFa', 'expanded_url': 'https://twitter.com/dog_rates/status/833124694597443584/photo/1', 'type': 'photo', 'sizes': {'large': {'w': 1150, 'h': 2048, 'resize': 'fit'}, 'small': {'w': 382, 'h': 680, 'resize': 'fit'}, 'thumb': {'w': 150, 'h': 150, 'resize': 'crop'}, 'medium': {'w': 674, 'h': 1200, 'resize': 'fit'}}}]}, 'source': '<a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>', 'in_reply_to_status_id': None, 'in_reply_to_status_id_str': None, 'in_reply_to_user_id': None, 'in_reply_to_user_id_str': None, 'in_reply_to_screen_name': None, 'user': {'id': 4196983835, 'id_str': '4196983835', 'name': 'WeRateDogs®', 'screen_name': 'dog_rates', 'location': '「 DM YOUR DOGS 」', 'description': 'Your Only Source For Professional Dog Ratings Instagram and Facebook ➪ WeRateDogs partnerships@weratedogs.com ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀', 'url': 'https://t.co/Wrvtpnv7JV', 'entities': {'url': {'urls': [{'url': 'https://t.co/Wrvtpnv7JV', 'expanded_url': 'https://blacklivesmatters.carrd.co', 'display_url': 'blacklivesmatters.carrd.co', 'indices': [0, 23]}]}, 'description': {'urls': []}}, 'protected': False, 'followers_count': 8815731, 'friends_count': 17, 'listed_count': 5695, 'created_at': 'Sun Nov 15 21:41:29 +0000 2015', 'favourites_count': 145866, 'utc_offset': None, 'time_zone': None, 'geo_enabled': True, 'verified': True, 'statuses_count': 12552, 'lang': None, 'contributors_enabled': False, 'is_translator': False, 'is_translation_enabled': False, 'profile_background_color': '000000', 'profile_background_image_url': 'http://abs.twimg.com/images/themes/theme1/bg.png', 'profile_background_image_url_https': 'https://abs.twimg.com/images/themes/theme1/bg.png', 'profile_background_tile': False, 'profile_image_url': 'http://pbs.twimg.com/profile_images/1267972589722296320/XBr04M6J_normal.jpg', 'profile_image_url_https': 'https://pbs.twimg.com/profile_images/1267972589722296320/XBr04M6J_normal.jpg', 'profile_banner_url': 'https://pbs.twimg.com/profile_banners/4196983835/1591077312', 'profile_link_color': 'F5ABB5', 'profile_sidebar_border_color': '000000', 'profile_sidebar_fill_color': '000000', 'profile_text_color': '000000', 'profile_use_background_image': False, 'has_extended_profile': False, 'default_profile': False, 'default_profile_image': False, 'following': False, 'follow_request_sent': False, 'notifications': False, 'translator_type': 'none'}, 'geo': None, 'coordinates': None, 'place': None, 'contributors': None, 'is_quote_status': False, 'retweet_count': 4802, 'favorite_count': 20111, 'favorited': False, 'retweeted': False, 'possibly_sensitive': False, 'possibly_sensitive_appealable': False, 'lang': 'en'} 1
{'created_at': 'Wed Dec 16 01:27:03 +0000 2015', 'id': 676936541936185344, 'id_str': '676936541936185344', 'full_text': 'Here we see a rare pouched pupper. Ample storage space. Looks alert. Jumps at random. Kicked open that door. 8/10 https://t.co/mqvaxleHRz', 'truncated': False, 'display_text_range': [0, 137], 'entities': {'hashtags': [], 'symbols': [], 'user_mentions': [], 'urls': [], 'media': [{'id': 676936535535656961, 'id_str': '676936535535656961', 'indices': [114, 137], 'media_url': 'http://pbs.twimg.com/media/CWT2MUgWIAECWig.jpg', 'media_url_https': 'https://pbs.twimg.com/media/CWT2MUgWIAECWig.jpg', 'url': 'https://t.co/mqvaxleHRz', 'display_url': 'pic.twitter.com/mqvaxleHRz', 'expanded_url': 'https://twitter.com/dog_rates/status/676936541936185344/photo/1', 'type': 'photo', 'sizes': {'thumb': {'w': 150, 'h': 150, 'resize': 'crop'}, 'small': {'w': 510, 'h': 680, 'resize': 'fit'}, 'large': {'w': 768, 'h': 1024, 'resize': 'fit'}, 'medium': {'w': 768, 'h': 1024, 'resize': 'fit'}}}]}, 'extended_entities': {'media': [{'id': 676936535535656961, 'id_str': '676936535535656961', 'indices': [114, 137], 'media_url': 'http://pbs.twimg.com/media/CWT2MUgWIAECWig.jpg', 'media_url_https': 'https://pbs.twimg.com/media/CWT2MUgWIAECWig.jpg', 'url': 'https://t.co/mqvaxleHRz', 'display_url': 'pic.twitter.com/mqvaxleHRz', 'expanded_url': 'https://twitter.com/dog_rates/status/676936541936185344/photo/1', 'type': 'photo', 'sizes': {'thumb': {'w': 150, 'h': 150, 'resize': 'crop'}, 'small': {'w': 510, 'h': 680, 'resize': 'fit'}, 'large': {'w': 768, 'h': 1024, 'resize': 'fit'}, 'medium': {'w': 768, 'h': 1024, 'resize': 'fit'}}}]}, 'source': '<a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>', 'in_reply_to_status_id': None, 'in_reply_to_status_id_str': None, 'in_reply_to_user_id': None, 'in_reply_to_user_id_str': None, 'in_reply_to_screen_name': None, 'user': {'id': 4196983835, 'id_str': '4196983835', 'name': 'WeRateDogs®', 'screen_name': 'dog_rates', 'location': '「 DM YOUR DOGS 」', 'description': 'Your Only Source For Professional Dog Ratings Instagram and Facebook ➪ WeRateDogs partnerships@weratedogs.com ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀', 'url': 'https://t.co/Wrvtpnv7JV', 'entities': {'url': {'urls': [{'url': 'https://t.co/Wrvtpnv7JV', 'expanded_url': 'https://blacklivesmatters.carrd.co', 'display_url': 'blacklivesmatters.carrd.co', 'indices': [0, 23]}]}, 'description': {'urls': []}}, 'protected': False, 'followers_count': 8815740, 'friends_count': 17, 'listed_count': 5695, 'created_at': 'Sun Nov 15 21:41:29 +0000 2015', 'favourites_count': 145866, 'utc_offset': None, 'time_zone': None, 'geo_enabled': True, 'verified': True, 'statuses_count': 12552, 'lang': None, 'contributors_enabled': False, 'is_translator': False, 'is_translation_enabled': False, 'profile_background_color': '000000', 'profile_background_image_url': 'http://abs.twimg.com/images/themes/theme1/bg.png', 'profile_background_image_url_https': 'https://abs.twimg.com/images/themes/theme1/bg.png', 'profile_background_tile': False, 'profile_image_url': 'http://pbs.twimg.com/profile_images/1267972589722296320/XBr04M6J_normal.jpg', 'profile_image_url_https': 'https://pbs.twimg.com/profile_images/1267972589722296320/XBr04M6J_normal.jpg', 'profile_banner_url': 'https://pbs.twimg.com/profile_banners/4196983835/1591077312', 'profile_link_color': 'F5ABB5', 'profile_sidebar_border_color': '000000', 'profile_sidebar_fill_color': '000000', 'profile_text_color': '000000', 'profile_use_background_image': False, 'has_extended_profile': False, 'default_profile': False, 'default_profile_image': False, 'following': False, 'follow_request_sent': False, 'notifications': False, 'translator_type': 'none'}, 'geo': None, 'coordinates': None, 'place': None, 'contributors': None, 'is_quote_status': False, 'retweet_count': 4787, 'favorite_count': 12377, 'favorited': False, 'retweeted': False, 'possibly_sensitive': False, 'possibly_sensitive_appealable': False, 'lang': 'en'} 1
{'created_at': 'Sun Nov 20 00:59:15 +0000 2016', 'id': 800141422401830912, 'id_str': '800141422401830912', 'full_text': 'This is Peaches. She's the ultimate selfie sidekick. Super sneaky tongue slip appreciated. 13/10 https://t.co/pbKOesr8Tg', 'truncated': False, 'display_text_range': [0, 96], 'entities': {'hashtags': [], 'symbols': [], 'user_mentions': [], 'urls': [], 'media': [{'id': 800141411257643009, 'id_str': '800141411257643009', 'indices': [97, 120], 'media_url': 'http://pbs.twimg.com/media/CxqsX8wXcAEnc3u.jpg', 'media_url_https': 'https://pbs.twimg.com/media/CxqsX8wXcAEnc3u.jpg', 'url': 'https://t.co/pbKOesr8Tg', 'display_url': 'pic.twitter.com/pbKOesr8Tg', 'expanded_url': 'https://twitter.com/dog_rates/status/800141422401830912/photo/1', 'type': 'photo', 'sizes': {'thumb': {'w': 150, 'h': 150, 'resize': 'crop'}, 'small': {'w': 680, 'h': 510, 'resize': 'fit'}, 'medium': {'w': 1024, 'h': 768, 'resize': 'fit'}, 'large': {'w': 1024, 'h': 768, 'resize': 'fit'}}}]}, 'extended_entities': {'media': [{'id': 800141411257643009, 'id_str': '800141411257643009', 'indices': [97, 120], 'media_url': 'http://pbs.twimg.com/media/CxqsX8wXcAEnc3u.jpg', 'media_url_https': 'https://pbs.twimg.com/media/CxqsX8wXcAEnc3u.jpg', 'url': 'https://t.co/pbKOesr8Tg', 'display_url': 'pic.twitter.com/pbKOesr8Tg', 'expanded_url': 'https://twitter.com/dog_rates/status/800141422401830912/photo/1', 'type': 'photo', 'sizes': {'thumb': {'w': 150, 'h': 150, 'resize': 'crop'}, 'small': {'w': 680, 'h': 510, 'resize': 'fit'}, 'medium': {'w': 1024, 'h': 768, 'resize': 'fit'}, 'large': {'w': 1024, 'h': 768, 'resize': 'fit'}}}, {'id': 800141411266007041, 'id_str': '800141411266007041', 'indices': [97, 120], 'media_url': 'http://pbs.twimg.com/media/CxqsX8yXEAEkgUe.jpg', 'media_url_https': 'https://pbs.twimg.com/media/CxqsX8yXEAEkgUe.jpg', 'url': 'https://t.co/pbKOesr8Tg', 'display_url': 'pic.twitter.com/pbKOesr8Tg', 'expanded_url': 'https://twitter.com/dog_rates/status/800141422401830912/photo/1', 'type': 'photo', 'sizes': {'thumb': {'w': 150, 'h': 150, 'resize': 'crop'}, 'large': {'w': 1024, 'h': 768, 'resize': 'fit'}, 'medium': {'w': 1024, 'h': 768, 'resize': 'fit'}, 'small': {'w': 680, 'h': 510, 'resize': 'fit'}}}, {'id': 800141411844837376, 'id_str': '800141411844837376', 'indices': [97, 120], 'media_url': 'http://pbs.twimg.com/media/CxqsX-8XUAAEvjD.jpg', 'media_url_https': 'https://pbs.twimg.com/media/CxqsX-8XUAAEvjD.jpg', 'url': 'https://t.co/pbKOesr8Tg', 'display_url': 'pic.twitter.com/pbKOesr8Tg', 'expanded_url': 'https://twitter.com/dog_rates/status/800141422401830912/photo/1', 'type': 'photo', 'sizes': {'thumb': {'w': 150, 'h': 150, 'resize': 'crop'}, 'large': {'w': 1024, 'h': 768, 'resize': 'fit'}, 'medium': {'w': 1024, 'h': 768, 'resize': 'fit'}, 'small': {'w': 680, 'h': 510, 'resize': 'fit'}}}]}, 'source': '<a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>', 'in_reply_to_status_id': None, 'in_reply_to_status_id_str': None, 'in_reply_to_user_id': None, 'in_reply_to_user_id_str': None, 'in_reply_to_screen_name': None, 'user': {'id': 4196983835, 'id_str': '4196983835', 'name': 'WeRateDogs®', 'screen_name': 'dog_rates', 'location': '「 DM YOUR DOGS 」', 'description': 'Your Only Source For Professional Dog Ratings Instagram and Facebook ➪ WeRateDogs partnerships@weratedogs.com ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀', 'url': 'https://t.co/Wrvtpnv7JV', 'entities': {'url': {'urls': [{'url': 'https://t.co/Wrvtpnv7JV', 'expanded_url': 'https://blacklivesmatters.carrd.co', 'display_url': 'blacklivesmatters.carrd.co', 'indices': [0, 23]}]}, 'description': {'urls': []}}, 'protected': False, 'followers_count': 8815734, 'friends_count': 17, 'listed_count': 5695, 'created_at': 'Sun Nov 15 21:41:29 +0000 2015', 'favourites_count': 145866, 'utc_offset': None, 'time_zone': None, 'geo_enabled': True, 'verified': True, 'statuses_count': 12552, 'lang': None, 'contributors_enabled': False, 'is_translator': False, 'is_translation_enabled': False, 'profile_background_color': '000000', 'profile_background_image_url': 'http://abs.twimg.com/images/themes/theme1/bg.png', 'profile_background_image_url_https': 'https://abs.twimg.com/images/themes/theme1/bg.png', 'profile_background_tile': False, 'profile_image_url': 'http://pbs.twimg.com/profile_images/1267972589722296320/XBr04M6J_normal.jpg', 'profile_image_url_https': 'https://pbs.twimg.com/profile_images/1267972589722296320/XBr04M6J_normal.jpg', 'profile_banner_url': 'https://pbs.twimg.com/profile_banners/4196983835/1591077312', 'profile_link_color': 'F5ABB5', 'profile_sidebar_border_color': '000000', 'profile_sidebar_fill_color': '000000', 'profile_text_color': '000000', 'profile_use_background_image': False, 'has_extended_profile': False, 'default_profile': False, 'default_profile_image': False, 'following': False, 'follow_request_sent': False, 'notifications': False, 'translator_type': 'none'}, 'geo': None, 'coordinates': None, 'place': None, 'contributors': None, 'is_quote_status': False, 'retweet_count': 2573, 'favorite_count': 15455, 'favorited': False, 'retweeted': False, 'possibly_sensitive': False, 'possibly_sensitive_appealable': False, 'lang': 'en'} 1
{'created_at': 'Tue Jul 05 20:41:01 +0000 2016', 'id': 750429297815552001, 'id_str': '750429297815552001', 'full_text': 'This is Arnie. He's a Nova Scotian Fridge Floof. Rare af. 12/10 https://t.co/lprdOylVpS', 'truncated': False, 'display_text_range': [0, 63], 'entities': {'hashtags': [], 'symbols': [], 'user_mentions': [], 'urls': [], 'media': [{'id': 750429289032642560, 'id_str': '750429289032642560', 'indices': [64, 87], 'media_url': 'http://pbs.twimg.com/media/CmoPdmHW8AAi8BI.jpg', 'media_url_https': 'https://pbs.twimg.com/media/CmoPdmHW8AAi8BI.jpg', 'url': 'https://t.co/lprdOylVpS', 'display_url': 'pic.twitter.com/lprdOylVpS', 'expanded_url': 'https://twitter.com/dog_rates/status/750429297815552001/photo/1', 'type': 'photo', 'sizes': {'thumb': {'w': 150, 'h': 150, 'resize': 'crop'}, 'medium': {'w': 1024, 'h': 768, 'resize': 'fit'}, 'small': {'w': 680, 'h': 510, 'resize': 'fit'}, 'large': {'w': 1024, 'h': 768, 'resize': 'fit'}}}]}, 'extended_entities': {'media': [{'id': 750429289032642560, 'id_str': '750429289032642560', 'indices': [64, 87], 'media_url': 'http://pbs.twimg.com/media/CmoPdmHW8AAi8BI.jpg', 'media_url_https': 'https://pbs.twimg.com/media/CmoPdmHW8AAi8BI.jpg', 'url': 'https://t.co/lprdOylVpS', 'display_url': 'pic.twitter.com/lprdOylVpS', 'expanded_url': 'https://twitter.com/dog_rates/status/750429297815552001/photo/1', 'type': 'photo', 'sizes': {'thumb': {'w': 150, 'h': 150, 'resize': 'crop'}, 'medium': {'w': 1024, 'h': 768, 'resize': 'fit'}, 'small': {'w': 680, 'h': 510, 'resize': 'fit'}, 'large': {'w': 1024, 'h': 768, 'resize': 'fit'}}}, {'id': 750429288596373504, 'id_str': '750429288596373504', 'indices': [64, 87], 'media_url': 'http://pbs.twimg.com/media/CmoPdkfWAAAagwY.jpg', 'media_url_https': 'https://pbs.twimg.com/media/CmoPdkfWAAAagwY.jpg', 'url': 'https://t.co/lprdOylVpS', 'display_url': 'pic.twitter.com/lprdOylVpS', 'expanded_url': 'https://twitter.com/dog_rates/status/750429297815552001/photo/1', 'type': 'photo', 'sizes': {'small': {'w': 510, 'h': 680, 'resize': 'fit'}, 'thumb': {'w': 150, 'h': 150, 'resize': 'crop'}, 'large': {'w': 768, 'h': 1024, 'resize': 'fit'}, 'medium': {'w': 768, 'h': 1024, 'resize': 'fit'}}}]}, 'source': '<a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>', 'in_reply_to_status_id': None, 'in_reply_to_status_id_str': None, 'in_reply_to_user_id': None, 'in_reply_to_user_id_str': None, 'in_reply_to_screen_name': None, 'user': {'id': 4196983835, 'id_str': '4196983835', 'name': 'WeRateDogs®', 'screen_name': 'dog_rates', 'location': '「 DM YOUR DOGS 」', 'description': 'Your Only Source For Professional Dog Ratings Instagram and Facebook ➪ WeRateDogs partnerships@weratedogs.com ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀', 'url': 'https://t.co/Wrvtpnv7JV', 'entities': {'url': {'urls': [{'url': 'https://t.co/Wrvtpnv7JV', 'expanded_url': 'https://blacklivesmatters.carrd.co', 'display_url': 'blacklivesmatters.carrd.co', 'indices': [0, 23]}]}, 'description': {'urls': []}}, 'protected': False, 'followers_count': 8815743, 'friends_count': 17, 'listed_count': 5695, 'created_at': 'Sun Nov 15 21:41:29 +0000 2015', 'favourites_count': 145866, 'utc_offset': None, 'time_zone': None, 'geo_enabled': True, 'verified': True, 'statuses_count': 12552, 'lang': None, 'contributors_enabled': False, 'is_translator': False, 'is_translation_enabled': False, 'profile_background_color': '000000', 'profile_background_image_url': 'http://abs.twimg.com/images/themes/theme1/bg.png', 'profile_background_image_url_https': 'https://abs.twimg.com/images/themes/theme1/bg.png', 'profile_background_tile': False, 'profile_image_url': 'http://pbs.twimg.com/profile_images/1267972589722296320/XBr04M6J_normal.jpg', 'profile_image_url_https': 'https://pbs.twimg.com/profile_images/1267972589722296320/XBr04M6J_normal.jpg', 'profile_banner_url': 'https://pbs.twimg.com/profile_banners/4196983835/1591077312', 'profile_link_color': 'F5ABB5', 'profile_sidebar_border_color': '000000', 'profile_sidebar_fill_color': '000000', 'profile_text_color': '000000', 'profile_use_background_image': False, 'has_extended_profile': False, 'default_profile': False, 'default_profile_image': False, 'following': False, 'follow_request_sent': False, 'notifications': False, 'translator_type': 'none'}, 'geo': None, 'coordinates': None, 'place': None, 'contributors': None, 'is_quote_status': False, 'retweet_count': 4238, 'favorite_count': 13094, 'favorited': False, 'retweeted': False, 'possibly_sensitive': False, 'possibly_sensitive_appealable': False, 'lang': 'en'} 1
{'created_at': 'Wed Jan 06 20:16:44 +0000 2016', 'id': 684830982659280897, 'id_str': '684830982659280897', 'full_text': 'This little fella really hates stairs. Prefers bush. 13/10 legendary pupper https://t.co/e3LPMAHj7p', 'truncated': False, 'display_text_range': [0, 99], 'entities': {'hashtags': [], 'symbols': [], 'user_mentions': [], 'urls': [{'url': 'https://t.co/e3LPMAHj7p', 'expanded_url': 'https://vine.co/v/eEZXZI1rqxX', 'display_url': 'vine.co/v/eEZXZI1rqxX', 'indices': [76, 99]}]}, 'source': '<a href="http://vine.co" rel="nofollow">Vine - Make a Scene</a>', 'in_reply_to_status_id': None, 'in_reply_to_status_id_str': None, 'in_reply_to_user_id': None, 'in_reply_to_user_id_str': None, 'in_reply_to_screen_name': None, 'user': {'id': 4196983835, 'id_str': '4196983835', 'name': 'WeRateDogs®', 'screen_name': 'dog_rates', 'location': '「 DM YOUR DOGS 」', 'description': 'Your Only Source For Professional Dog Ratings Instagram and Facebook ➪ WeRateDogs partnerships@weratedogs.com ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀', 'url': 'https://t.co/Wrvtpnv7JV', 'entities': {'url': {'urls': [{'url': 'https://t.co/Wrvtpnv7JV', 'expanded_url': 'https://blacklivesmatters.carrd.co', 'display_url': 'blacklivesmatters.carrd.co', 'indices': [0, 23]}]}, 'description': {'urls': []}}, 'protected': False, 'followers_count': 8815741, 'friends_count': 17, 'listed_count': 5695, 'created_at': 'Sun Nov 15 21:41:29 +0000 2015', 'favourites_count': 145866, 'utc_offset': None, 'time_zone': None, 'geo_enabled': True, 'verified': True, 'statuses_count': 12552, 'lang': None, 'contributors_enabled': False, 'is_translator': False, 'is_translation_enabled': False, 'profile_background_color': '000000', 'profile_background_image_url': 'http://abs.twimg.com/images/themes/theme1/bg.png', 'profile_background_image_url_https': 'https://abs.twimg.com/images/themes/theme1/bg.png', 'profile_background_tile': False, 'profile_image_url': 'http://pbs.twimg.com/profile_images/1267972589722296320/XBr04M6J_normal.jpg', 'profile_image_url_https': 'https://pbs.twimg.com/profile_images/1267972589722296320/XBr04M6J_normal.jpg', 'profile_banner_url': 'https://pbs.twimg.com/profile_banners/4196983835/1591077312', 'profile_link_color': 'F5ABB5', 'profile_sidebar_border_color': '000000', 'profile_sidebar_fill_color': '000000', 'profile_text_color': '000000', 'profile_use_background_image': False, 'has_extended_profile': False, 'default_profile': False, 'default_profile_image': False, 'following': False, 'follow_request_sent': False, 'notifications': False, 'translator_type': 'none'}, 'geo': None, 'coordinates': None, 'place': None, 'contributors': None, 'is_quote_status': False, 'retweet_count': 21334, 'favorite_count': 34569, 'favorited': False, 'retweeted': False, 'possibly_sensitive': False, 'possibly_sensitive_appealable': False, 'lang': 'en'} 1
Name: retweeted_status, Length: 162, dtype: int64
From the assessment process above, the result is divide into two kinds, quality and tidiness issues.
Quality: issues with content. Low-quality data is also known as dirty data.
archive dataframe:¶image dataframe:¶tweepy dataframe:¶Tidiness: issues with a structure that prevents easy analysis. Untidy data is also known as messy data.
archive dataframe¶image dataframe:¶tweepy dataframe:¶-
The programmatic data cleaning process:
As always, we need to copy our dataframe before do any cleaning process, so we can refer back to the old ones.
What we will do for this dataframe are:
.drop() method.astype() method.astype() method# Prepare, copy the original dataframe
archive_df_clean = archive_df.copy()
# Define: Remove not useful for analysis columns
# Code
list = ['in_reply_to_status_id',
'in_reply_to_user_id', 'source', 'expanded_urls']
archive_df_clean.drop(list, axis=1, inplace=True)
# Test
archive_df_clean.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 2356 entries, 0 to 2355 Data columns (total 13 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 tweet_id 2356 non-null int64 1 timestamp 2356 non-null object 2 text 2356 non-null object 3 retweeted_status_id 181 non-null float64 4 retweeted_status_user_id 181 non-null float64 5 retweeted_status_timestamp 181 non-null object 6 rating_numerator 2356 non-null int64 7 rating_denominator 2356 non-null int64 8 name 2356 non-null object 9 doggo 2356 non-null object 10 floofer 2356 non-null object 11 pupper 2356 non-null object 12 puppo 2356 non-null object dtypes: float64(2), int64(3), object(8) memory usage: 239.4+ KB
Based on .info() there is 181 row that which is not original tweet
# Define: Select only the row that has null value in retweeted_status_id column
# Code
archive_df_clean = archive_df_clean[archive_df_clean['retweeted_status_id'].isnull()]
# Test
archive_df_clean.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 2175 entries, 0 to 2355 Data columns (total 13 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 tweet_id 2175 non-null int64 1 timestamp 2175 non-null object 2 text 2175 non-null object 3 retweeted_status_id 0 non-null float64 4 retweeted_status_user_id 0 non-null float64 5 retweeted_status_timestamp 0 non-null object 6 rating_numerator 2175 non-null int64 7 rating_denominator 2175 non-null int64 8 name 2175 non-null object 9 doggo 2175 non-null object 10 floofer 2175 non-null object 11 pupper 2175 non-null object 12 puppo 2175 non-null object dtypes: float64(2), int64(3), object(8) memory usage: 237.9+ KB
# Define: Remove not useful for analysis columns
# Code
list = ['retweeted_status_id', 'retweeted_status_user_id',
'retweeted_status_timestamp']
archive_df_clean.drop(list, axis=1, inplace=True)
# Test
archive_df_clean.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 2175 entries, 0 to 2355 Data columns (total 10 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 tweet_id 2175 non-null int64 1 timestamp 2175 non-null object 2 text 2175 non-null object 3 rating_numerator 2175 non-null int64 4 rating_denominator 2175 non-null int64 5 name 2175 non-null object 6 doggo 2175 non-null object 7 floofer 2175 non-null object 8 pupper 2175 non-null object 9 puppo 2175 non-null object dtypes: int64(3), object(7) memory usage: 186.9+ KB
# Define: Fix the wrong dtype using .astype
# Code
dict = {'tweet_id': 'object', 'timestamp': 'datetime64[ns]'}
archive_df_clean = archive_df_clean.astype(dict)
# Test
archive_df_clean.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 2175 entries, 0 to 2355 Data columns (total 10 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 tweet_id 2175 non-null object 1 timestamp 2175 non-null datetime64[ns] 2 text 2175 non-null object 3 rating_numerator 2175 non-null int64 4 rating_denominator 2175 non-null int64 5 name 2175 non-null object 6 doggo 2175 non-null object 7 floofer 2175 non-null object 8 pupper 2175 non-null object 9 puppo 2175 non-null object dtypes: datetime64[ns](1), int64(2), object(7) memory usage: 186.9+ KB
wrong_detection_index = [516, 1202, 2335, 342]
for s in wrong_detection_index:
print(s, "\t", archive_df['text'][s],
"\t", archive_df_clean['rating_numerator'][s],
"\t", archive_df_clean['rating_denominator'][s])
516 Meet Sam. She smiles 24/7 & secretly aspires to be a reindeer. Keep Sam smiling by clicking and sharing this link: https://t.co/98tB8y7y7t https://t.co/LouL5vdvxx 24 7 1202 This is Bluebert. He just saw that both #FinalFur match ups are split 50/50. Amazed af. 11/10 https://t.co/Kky1DPG4iq 50 50 2335 This is an Albanian 3 1/2 legged Episcopalian. Loves well-polished hardwood flooring. Penis on the collar. 9/10 https://t.co/d9NcXFKwLv 1 2 342 @docmisterio account started on 11/15/15 11 15
# Define: Since this was wrong detection, we do manual update for each occasion
# Index 516, change num and denum to NaN
# Code
archive_df_clean.loc[516, 'rating_numerator'] = np.NaN
archive_df_clean.loc[516, 'rating_denominator'] = np.NaN
# Define: Index 1202, change num to 11 and denum to 10
# Code
archive_df_clean.loc[1202, 'rating_numerator'] = 11
archive_df_clean.loc[1202, 'rating_denominator'] = 10
# Define: Index 2335, change num to 9 and denum to 10
# Code
archive_df_clean.loc[2335, 'rating_numerator'] = 9
archive_df_clean.loc[2335, 'rating_denominator'] = 10
# Define Index 342, change num and denum to NaN
# Code
archive_df_clean.loc[342, 'rating_numerator'] = np.NaN
archive_df_clean.loc[342, 'rating_denominator'] = np.NaN
# Test all above
wrong_detection_index = [516, 1202, 2335, 342]
for s in wrong_detection_index:
print(s, "\t", archive_df['text'][s],
"\t", archive_df_clean['rating_numerator'][s],
"\t", archive_df_clean['rating_denominator'][s])
516 Meet Sam. She smiles 24/7 & secretly aspires to be a reindeer. Keep Sam smiling by clicking and sharing this link: https://t.co/98tB8y7y7t https://t.co/LouL5vdvxx nan nan 1202 This is Bluebert. He just saw that both #FinalFur match ups are split 50/50. Amazed af. 11/10 https://t.co/Kky1DPG4iq 11.0 10.0 2335 This is an Albanian 3 1/2 legged Episcopalian. Loves well-polished hardwood flooring. Penis on the collar. 9/10 https://t.co/d9NcXFKwLv 9.0 10.0 342 @docmisterio account started on 11/15/15 nan nan
The decimal numerator is like in index 1712 and 763. Then we have to suspect something else like this, so we do a re-assessment data.
decimal_detection_index = [763, 1712]
for s in decimal_detection_index:
print(s, "\t", archive_df['text'][s],
"\t", archive_df_clean['rating_numerator'][s],
"\t", archive_df_clean['rating_denominator'][s])
763 This is Sophie. She's a Jubilant Bush Pupper. Super h*ckin rare. Appears at random just to smile at the locals. 11.27/10 would smile back https://t.co/QFaUiIHxHq 27.0 10.0 1712 Here we have uncovered an entire battalion of holiday puppers. Average of 11.26/10 https://t.co/eNm2S6p9BD 26.0 10.0
# Check all decimal occasion
for s in archive_df_clean.index.to_list():
text = archive_df_clean['text'][s]
regexp = re.compile(r'(\d+\.\d*\/\d+)')
if regexp.search(text):
print(s, "\t", archive_df['text'][s],
"\t", archive_df_clean['rating_numerator'][s],
"\t", archive_df_clean['rating_denominator'][s])
45 This is Bella. She hopes her smile made you smile. If not, she is also offering you her favorite monkey. 13.5/10 https://t.co/qjrljjt948 5.0 10.0 695 This is Logan, the Chow who lived. He solemnly swears he's up to lots of good. H*ckin magical af 9.75/10 https://t.co/yBO5wuqaPS 75.0 10.0 763 This is Sophie. She's a Jubilant Bush Pupper. Super h*ckin rare. Appears at random just to smile at the locals. 11.27/10 would smile back https://t.co/QFaUiIHxHq 27.0 10.0 1689 I've been told there's a slight possibility he's checking his mirror. We'll bump to 9.5/10. Still a menace 5.0 10.0 1712 Here we have uncovered an entire battalion of holiday puppers. Average of 11.26/10 https://t.co/eNm2S6p9BD 26.0 10.0
# Define: Fix decimal nominator and denominator
# Code
rating = archive_df_clean.text.str.extract('((?:\d+\.)?\d+)\/(\d+)', expand=True)
rating.columns = ['rating_numerator', 'rating_denominator']
archive_df_clean['rating_numerator'] = rating['rating_numerator'].astype(float)
archive_df_clean['rating_denominator'] = rating['rating_denominator'].astype(float)
# Test
for s in archive_df_clean.index.to_list():
text = archive_df_clean['text'][s]
regexp = re.compile(r'(\d+\.\d*\/\d+)')
if regexp.search(text):
print(s, "\t", archive_df['text'][s],
"\t", archive_df_clean['rating_numerator'][s],
"\t", archive_df_clean['rating_denominator'][s])
45 This is Bella. She hopes her smile made you smile. If not, she is also offering you her favorite monkey. 13.5/10 https://t.co/qjrljjt948 13.5 10.0 695 This is Logan, the Chow who lived. He solemnly swears he's up to lots of good. H*ckin magical af 9.75/10 https://t.co/yBO5wuqaPS 9.75 10.0 763 This is Sophie. She's a Jubilant Bush Pupper. Super h*ckin rare. Appears at random just to smile at the locals. 11.27/10 would smile back https://t.co/QFaUiIHxHq 11.27 10.0 1689 I've been told there's a slight possibility he's checking his mirror. We'll bump to 9.5/10. Still a menace 9.5 10.0 1712 Here we have uncovered an entire battalion of holiday puppers. Average of 11.26/10 https://t.co/eNm2S6p9BD 11.26 10.0
# Define: Change None -> NaN
# Code
archive_df_clean['name'] = archive_df_clean['name'].replace('None', np.NaN)
# Test
archive_df_clean.name.sample(10)
954 Fred 1159 Sarge 1500 Edgar 2016 Bradley 38 Earl 950 Brody 1053 NaN 1548 Lucky 2185 Ruby 1542 NaN Name: name, dtype: object
# Before do anly cleaning, we need to change None value to 0
# Define: Change None value to 0
# Code
col = ['doggo', 'floofer', 'pupper', 'puppo']
for c in col:
archive_df_clean[col] = archive_df_clean[col].replace('None', 0)
# Test
archive_df_clean[col].sample(10)
| doggo | floofer | pupper | puppo | |
|---|---|---|---|---|
| 2058 | 0 | 0 | 0 | 0 |
| 330 | 0 | 0 | pupper | 0 |
| 505 | 0 | 0 | 0 | 0 |
| 31 | 0 | 0 | 0 | 0 |
| 2240 | 0 | 0 | 0 | 0 |
| 1937 | 0 | 0 | pupper | 0 |
| 2085 | 0 | 0 | 0 | 0 |
| 1827 | 0 | 0 | 0 | 0 |
| 2250 | 0 | 0 | 0 | 0 |
| 771 | 0 | 0 | 0 | 0 |
#Define: We will make dog stage columns into one concise column
# Code
dog_stage = []
for idx, col in archive_df_clean.iterrows():
doggo = col[-4]
floofer = col[-3]
pupper = col[-2]
puppo = col[-1]
if int(bool(doggo)) + int(bool(floofer)) + int(bool(pupper)) + int(bool(puppo)) == 1:
if doggo:
dog_stage.append(doggo)
elif floofer:
dog_stage.append(floofer)
elif pupper:
dog_stage.append(pupper)
elif puppo:
dog_stage.append(puppo)
elif int(bool(doggo)) + int(bool(floofer)) + int(bool(pupper)) + int(bool(puppo)) > 1:
dog_stage.append('multiple_stages')
else:
dog_stage.append(np.NaN)
# Make new column for image dataframe
archive_df_clean['dog_stage'] = dog_stage
# Test
archive_df_clean['dog_stage'].sample(10)
1952 NaN 87 NaN 1345 NaN 1571 pupper 205 NaN 1148 NaN 1958 NaN 2230 NaN 2254 NaN 893 NaN Name: dog_stage, dtype: object
archive_df_clean['dog_stage'].value_counts()
pupper 224 doggo 75 puppo 24 multiple_stages 12 floofer 9 Name: dog_stage, dtype: int64
Since we found 12 rows with multiple_stages, we need to examine further. It's possible caused by more than one dogs in the post or wrong auto-detection.
archive_df_clean[archive_df_clean['dog_stage'] == 'multiple_stages']
| tweet_id | timestamp | text | rating_numerator | rating_denominator | name | doggo | floofer | pupper | puppo | dog_stage | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 191 | 855851453814013952 | 2017-04-22 18:31:02 | Here's a puppo participating in the #ScienceMa... | 13.0 | 10.0 | NaN | doggo | 0 | 0 | puppo | multiple_stages |
| 200 | 854010172552949760 | 2017-04-17 16:34:26 | At first I thought this was a shy doggo, but i... | 11.0 | 10.0 | NaN | doggo | floofer | 0 | 0 | multiple_stages |
| 460 | 817777686764523521 | 2017-01-07 16:59:28 | This is Dido. She's playing the lead role in "... | 13.0 | 10.0 | Dido | doggo | 0 | pupper | 0 | multiple_stages |
| 531 | 808106460588765185 | 2016-12-12 00:29:28 | Here we have Burke (pupper) and Dexter (doggo)... | 12.0 | 10.0 | NaN | doggo | 0 | pupper | 0 | multiple_stages |
| 565 | 802265048156610565 | 2016-11-25 21:37:47 | Like doggo, like pupper version 2. Both 11/10 ... | 11.0 | 10.0 | NaN | doggo | 0 | pupper | 0 | multiple_stages |
| 575 | 801115127852503040 | 2016-11-22 17:28:25 | This is Bones. He's being haunted by another d... | 12.0 | 10.0 | Bones | doggo | 0 | pupper | 0 | multiple_stages |
| 705 | 785639753186217984 | 2016-10-11 00:34:48 | This is Pinot. He's a sophisticated doggo. You... | 10.0 | 10.0 | Pinot | doggo | 0 | pupper | 0 | multiple_stages |
| 733 | 781308096455073793 | 2016-09-29 01:42:20 | Pupper butt 1, Doggo 0. Both 12/10 https://t.c... | 12.0 | 10.0 | NaN | doggo | 0 | pupper | 0 | multiple_stages |
| 889 | 759793422261743616 | 2016-07-31 16:50:42 | Meet Maggie & Lila. Maggie is the doggo, L... | 12.0 | 10.0 | Maggie | doggo | 0 | pupper | 0 | multiple_stages |
| 956 | 751583847268179968 | 2016-07-09 01:08:47 | Please stop sending it pictures that don't eve... | 5.0 | 10.0 | NaN | doggo | 0 | pupper | 0 | multiple_stages |
| 1063 | 741067306818797568 | 2016-06-10 00:39:48 | This is just downright precious af. 12/10 for ... | 12.0 | 10.0 | just | doggo | 0 | pupper | 0 | multiple_stages |
| 1113 | 733109485275860992 | 2016-05-19 01:38:16 | Like father (doggo), like son (pupper). Both 1... | 12.0 | 10.0 | NaN | doggo | 0 | pupper | 0 | multiple_stages |
# Visually check the stages from the text post
multiple_stages_index = archive_df_clean[archive_df_clean['dog_stage'] == 'multiple_stages'].index.to_list()
for s in multiple_stages_index:
print(s, "\t", archive_df_clean['text'][s],
"\n", archive_df_clean['dog_stage'][s])
191 Here's a puppo participating in the #ScienceMarch. Cleverly disguising her own doggo agenda. 13/10 would keep the planet habitable for https://t.co/cMhq16isel multiple_stages 200 At first I thought this was a shy doggo, but it's actually a Rare Canadian Floofer Owl. Amateurs would confuse the two. 11/10 only send dogs https://t.co/TXdT3tmuYk multiple_stages 460 This is Dido. She's playing the lead role in "Pupper Stops to Catch Snow Before Resuming Shadow Box with Dried Apple." 13/10 (IG: didodoggo) https://t.co/m7isZrOBX7 multiple_stages 531 Here we have Burke (pupper) and Dexter (doggo). Pupper wants to be exactly like doggo. Both 12/10 would pet at same time https://t.co/ANBpEYHaho multiple_stages 565 Like doggo, like pupper version 2. Both 11/10 https://t.co/9IxWAXFqze multiple_stages 575 This is Bones. He's being haunted by another doggo of roughly the same size. 12/10 deep breaths pupper everything's fine https://t.co/55Dqe0SJNj multiple_stages 705 This is Pinot. He's a sophisticated doggo. You can tell by the hat. Also pointier than your average pupper. Still 10/10 would pet cautiously https://t.co/f2wmLZTPHd multiple_stages 733 Pupper butt 1, Doggo 0. Both 12/10 https://t.co/WQvcPEpH2u multiple_stages 889 Meet Maggie & Lila. Maggie is the doggo, Lila is the pupper. They are sisters. Both 12/10 would pet at the same time https://t.co/MYwR4DQKll multiple_stages 956 Please stop sending it pictures that don't even have a doggo or pupper in them. Churlish af. 5/10 neat couch tho https://t.co/u2c9c7qSg8 multiple_stages 1063 This is just downright precious af. 12/10 for both pupper and doggo https://t.co/o5J479bZUC multiple_stages 1113 Like father (doggo), like son (pupper). Both 12/10 https://t.co/pG2inLaOda multiple_stages
# Define: We need to fix the stages one by one
# Code
# Make dictionary with key=index, value=fixed stage
dict_stage = {191: 'puppo',
200: 'floofer',
460: 'pupper',
575: 'pupper',
705: np.NaN, #not even a dog
965: 'doggo'}
for key, value in dict_stage.items():
archive_df_clean.loc[key, 'dog_stage'] = value
# Test
for s in dict_stage.keys():
print(s, "\t", archive_df_clean['text'][s],
"\n", archive_df_clean['dog_stage'][s])
191 Here's a puppo participating in the #ScienceMarch. Cleverly disguising her own doggo agenda. 13/10 would keep the planet habitable for https://t.co/cMhq16isel puppo 200 At first I thought this was a shy doggo, but it's actually a Rare Canadian Floofer Owl. Amateurs would confuse the two. 11/10 only send dogs https://t.co/TXdT3tmuYk floofer 460 This is Dido. She's playing the lead role in "Pupper Stops to Catch Snow Before Resuming Shadow Box with Dried Apple." 13/10 (IG: didodoggo) https://t.co/m7isZrOBX7 pupper 575 This is Bones. He's being haunted by another doggo of roughly the same size. 12/10 deep breaths pupper everything's fine https://t.co/55Dqe0SJNj pupper 705 This is Pinot. He's a sophisticated doggo. You can tell by the hat. Also pointier than your average pupper. Still 10/10 would pet cautiously https://t.co/f2wmLZTPHd nan 965 This is Arnie. He's a Nova Scotian Fridge Floof. Rare af. 12/10 https://t.co/lprdOylVpS doggo
# Define: Change dos_stage dtype to category
# Code
archive_df_clean.dog_stage = archive_df_clean.dog_stage.astype('category')
# Test
archive_df_clean.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 2175 entries, 0 to 2355 Data columns (total 11 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 tweet_id 2175 non-null object 1 timestamp 2175 non-null datetime64[ns] 2 text 2175 non-null object 3 rating_numerator 2175 non-null float64 4 rating_denominator 2175 non-null float64 5 name 1495 non-null object 6 doggo 2175 non-null object 7 floofer 2175 non-null object 8 pupper 2175 non-null object 9 puppo 2175 non-null object 10 dog_stage 344 non-null category dtypes: category(1), datetime64[ns](1), float64(2), object(7) memory usage: 269.2+ KB
# Define: Drop doggo, floofer, pupper, puppo column
# Code
stages = ['doggo', 'floofer', 'pupper', 'puppo']
archive_df_clean.drop(stages, axis=1, inplace=True)
# Test
archive_df_clean.sample(10)
| tweet_id | timestamp | text | rating_numerator | rating_denominator | name | dog_stage | |
|---|---|---|---|---|---|---|---|
| 2028 | 671866342182637568 | 2015-12-02 01:39:53 | Meet Dylan. He can use a fork but clearly can'... | 10.0 | 10.0 | Dylan | NaN |
| 1871 | 675147105808306176 | 2015-12-11 02:56:28 | When you're presenting a group project and the... | 10.0 | 10.0 | NaN | NaN |
| 1903 | 674638615994089473 | 2015-12-09 17:15:54 | This pupper is fed up with being tickled. 12/1... | 12.0 | 10.0 | NaN | pupper |
| 1743 | 679405845277462528 | 2015-12-22 20:59:10 | Crazy unseen footage from Jurassic Park. 10/10... | 10.0 | 10.0 | NaN | NaN |
| 468 | 817056546584727552 | 2017-01-05 17:13:55 | This is Chloe. She fell asleep at the wheel. A... | 11.0 | 10.0 | Chloe | NaN |
| 1062 | 741099773336379392 | 2016-06-10 02:48:49 | This is Ted. He's given up. 11/10 relatable af... | 11.0 | 10.0 | Ted | NaN |
| 2293 | 667152164079423490 | 2015-11-19 01:27:25 | This is Pipsy. He is a fluffball. Enjoys trave... | 12.0 | 10.0 | Pipsy | NaN |
| 372 | 828381636999917570 | 2017-02-05 23:15:47 | Meet Doobert. He's a deaf doggo. Didn't stop h... | 14.0 | 10.0 | Doobert | doggo |
| 576 | 800859414831898624 | 2016-11-22 00:32:18 | @SkyWilliams doggo simply protecting you from ... | 11.0 | 10.0 | NaN | doggo |
| 503 | 813066809284972545 | 2016-12-25 17:00:08 | This is Tyr. He is disgusted by holiday traffi... | 12.0 | 10.0 | Tyr | NaN |
# Define: Final check and reset index
# Code
archive_df_clean.reset_index(drop=True, inplace=True)
# Test
archive_df_clean
| tweet_id | timestamp | text | rating_numerator | rating_denominator | name | dog_stage | |
|---|---|---|---|---|---|---|---|
| 0 | 892420643555336193 | 2017-08-01 16:23:56 | This is Phineas. He's a mystical boy. Only eve... | 13.0 | 10.0 | Phineas | NaN |
| 1 | 892177421306343426 | 2017-08-01 00:17:27 | This is Tilly. She's just checking pup on you.... | 13.0 | 10.0 | Tilly | NaN |
| 2 | 891815181378084864 | 2017-07-31 00:18:03 | This is Archie. He is a rare Norwegian Pouncin... | 12.0 | 10.0 | Archie | NaN |
| 3 | 891689557279858688 | 2017-07-30 15:58:51 | This is Darla. She commenced a snooze mid meal... | 13.0 | 10.0 | Darla | NaN |
| 4 | 891327558926688256 | 2017-07-29 16:00:24 | This is Franklin. He would like you to stop ca... | 12.0 | 10.0 | Franklin | NaN |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 2170 | 666049248165822465 | 2015-11-16 00:24:50 | Here we have a 1949 1st generation vulpix. Enj... | 5.0 | 10.0 | NaN | NaN |
| 2171 | 666044226329800704 | 2015-11-16 00:04:52 | This is a purebred Piers Morgan. Loves to Netf... | 6.0 | 10.0 | a | NaN |
| 2172 | 666033412701032449 | 2015-11-15 23:21:54 | Here is a very happy pup. Big fan of well-main... | 9.0 | 10.0 | a | NaN |
| 2173 | 666029285002620928 | 2015-11-15 23:05:30 | This is a western brown Mitsubishi terrier. Up... | 7.0 | 10.0 | a | NaN |
| 2174 | 666020888022790149 | 2015-11-15 22:32:08 | Here we have a Japanese Irish Setter. Lost eye... | 8.0 | 10.0 | NaN | NaN |
2175 rows × 7 columns
What we will do for this dataframe are:
image_df_clean = image_df.copy()
image_df_clean
| tweet_id | jpg_url | img_num | p1 | p1_conf | p1_dog | p2 | p2_conf | p2_dog | p3 | p3_conf | p3_dog | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 666020888022790149 | https://pbs.twimg.com/media/CT4udn0WwAA0aMy.jpg | 1 | Welsh_springer_spaniel | 0.465074 | True | collie | 0.156665 | True | Shetland_sheepdog | 0.061428 | True |
| 1 | 666029285002620928 | https://pbs.twimg.com/media/CT42GRgUYAA5iDo.jpg | 1 | redbone | 0.506826 | True | miniature_pinscher | 0.074192 | True | Rhodesian_ridgeback | 0.072010 | True |
| 2 | 666033412701032449 | https://pbs.twimg.com/media/CT4521TWwAEvMyu.jpg | 1 | German_shepherd | 0.596461 | True | malinois | 0.138584 | True | bloodhound | 0.116197 | True |
| 3 | 666044226329800704 | https://pbs.twimg.com/media/CT5Dr8HUEAA-lEu.jpg | 1 | Rhodesian_ridgeback | 0.408143 | True | redbone | 0.360687 | True | miniature_pinscher | 0.222752 | True |
| 4 | 666049248165822465 | https://pbs.twimg.com/media/CT5IQmsXIAAKY4A.jpg | 1 | miniature_pinscher | 0.560311 | True | Rottweiler | 0.243682 | True | Doberman | 0.154629 | True |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 2070 | 891327558926688256 | https://pbs.twimg.com/media/DF6hr6BUMAAzZgT.jpg | 2 | basset | 0.555712 | True | English_springer | 0.225770 | True | German_short-haired_pointer | 0.175219 | True |
| 2071 | 891689557279858688 | https://pbs.twimg.com/media/DF_q7IAWsAEuuN8.jpg | 1 | paper_towel | 0.170278 | False | Labrador_retriever | 0.168086 | True | spatula | 0.040836 | False |
| 2072 | 891815181378084864 | https://pbs.twimg.com/media/DGBdLU1WsAANxJ9.jpg | 1 | Chihuahua | 0.716012 | True | malamute | 0.078253 | True | kelpie | 0.031379 | True |
| 2073 | 892177421306343426 | https://pbs.twimg.com/media/DGGmoV4XsAAUL6n.jpg | 1 | Chihuahua | 0.323581 | True | Pekinese | 0.090647 | True | papillon | 0.068957 | True |
| 2074 | 892420643555336193 | https://pbs.twimg.com/media/DGKD1-bXoAAIAUK.jpg | 1 | orange | 0.097049 | False | bagel | 0.085851 | False | banana | 0.076110 | False |
2075 rows × 12 columns
# Define: Change tweet_id dtype to object
# Code
image_df_clean.tweet_id = image_df_clean.tweet_id.astype('object')
# Test
image_df_clean.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 2075 entries, 0 to 2074 Data columns (total 12 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 tweet_id 2075 non-null object 1 jpg_url 2075 non-null object 2 img_num 2075 non-null int64 3 p1 2075 non-null object 4 p1_conf 2075 non-null float64 5 p1_dog 2075 non-null bool 6 p2 2075 non-null object 7 p2_conf 2075 non-null float64 8 p2_dog 2075 non-null bool 9 p3 2075 non-null object 10 p3_conf 2075 non-null float64 11 p3_dog 2075 non-null bool dtypes: bool(3), float64(3), int64(1), object(5) memory usage: 152.1+ KB
From the assessment, we found that there is 66 row with duplicated jpg_url.
image_df_clean[image_df_clean.jpg_url.duplicated()]
| tweet_id | jpg_url | img_num | p1 | p1_conf | p1_dog | p2 | p2_conf | p2_dog | p3 | p3_conf | p3_dog | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1297 | 752309394570878976 | https://pbs.twimg.com/ext_tw_video_thumb/67535... | 1 | upright | 0.303415 | False | golden_retriever | 0.181351 | True | Brittany_spaniel | 0.162084 | True |
| 1315 | 754874841593970688 | https://pbs.twimg.com/media/CWza7kpWcAAdYLc.jpg | 1 | pug | 0.272205 | True | bull_mastiff | 0.251530 | True | bath_towel | 0.116806 | False |
| 1333 | 757729163776290825 | https://pbs.twimg.com/media/CWyD2HGUYAQ1Xa7.jpg | 2 | cash_machine | 0.802333 | False | schipperke | 0.045519 | True | German_shepherd | 0.023353 | True |
| 1345 | 759159934323924993 | https://pbs.twimg.com/media/CU1zsMSUAAAS0qW.jpg | 1 | Irish_terrier | 0.254856 | True | briard | 0.227716 | True | soft-coated_wheaten_terrier | 0.223263 | True |
| 1349 | 759566828574212096 | https://pbs.twimg.com/media/CkNjahBXAAQ2kWo.jpg | 1 | Labrador_retriever | 0.967397 | True | golden_retriever | 0.016641 | True | ice_bear | 0.014858 | False |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 1903 | 851953902622658560 | https://pbs.twimg.com/media/C4KHj-nWQAA3poV.jpg | 1 | Staffordshire_bullterrier | 0.757547 | True | American_Staffordshire_terrier | 0.149950 | True | Chesapeake_Bay_retriever | 0.047523 | True |
| 1944 | 861769973181624320 | https://pbs.twimg.com/media/CzG425nWgAAnP7P.jpg | 2 | Arabian_camel | 0.366248 | False | house_finch | 0.209852 | False | cocker_spaniel | 0.046403 | True |
| 1992 | 873697596434513921 | https://pbs.twimg.com/media/DA7iHL5U0AA1OQo.jpg | 1 | laptop | 0.153718 | False | French_bulldog | 0.099984 | True | printer | 0.077130 | False |
| 2041 | 885311592912609280 | https://pbs.twimg.com/media/C4bTH6nWMAAX_bJ.jpg | 1 | Labrador_retriever | 0.908703 | True | seat_belt | 0.057091 | False | pug | 0.011933 | True |
| 2055 | 888202515573088257 | https://pbs.twimg.com/media/DFDw2tyUQAAAFke.jpg | 2 | Pembroke | 0.809197 | True | Rhodesian_ridgeback | 0.054950 | True | beagle | 0.038915 | True |
66 rows × 12 columns
# Define: Drop the duplicated
# Code
image_df_clean.drop_duplicates(subset='jpg_url', keep='first', inplace=True)
# Test
image_df_clean[image_df_clean.jpg_url.duplicated()]
| tweet_id | jpg_url | img_num | p1 | p1_conf | p1_dog | p2 | p2_conf | p2_dog | p3 | p3_conf | p3_dog |
|---|
Make new columns for p, p_conf, and p_dog only, based on prediction.
# Define
# Make iteration with if function to determine dog breed/type and p_conf score,
# based on boolean value in p1, p2, or p3
# Code
dog_type = []
p_conf = []
for idx, col in image_df_clean.iterrows():
p1_dog = col[5]
p2_dog = col[8]
p3_dog = col[11]
if p1_dog:
dog_type.append(col[3])
p_conf.append(col[4])
elif p3_dog:
dog_type.append(col[6])
p_conf.append(col[7])
elif p3_dog:
dog_type.append(col[9])
p_conf.append(col[10])
else:
dog_type.append(np.NaN)
p_conf.append(np.NaN)
# Make new column for image dataframe
image_df_clean['dog_type'] = dog_type
image_df_clean['p_conf'] = p_conf
# Test
image_df_clean.sample(10)
| tweet_id | jpg_url | img_num | p1 | p1_conf | p1_dog | p2 | p2_conf | p2_dog | p3 | p3_conf | p3_dog | dog_type | p_conf | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1112 | 724049859469295616 | https://pbs.twimg.com/media/CgxXf1TWYAEjY61.jpg | 1 | Border_collie | 0.581835 | True | collie | 0.344588 | True | Shetland_sheepdog | 0.043584 | True | Border_collie | 0.581835 |
| 150 | 668641109086707712 | https://pbs.twimg.com/media/CUd9ivxWUAAuXSQ.jpg | 1 | vacuum | 0.432594 | False | pug | 0.146311 | True | toilet_tissue | 0.024500 | False | NaN | NaN |
| 1608 | 800751577355128832 | https://pbs.twimg.com/media/CxzXOyBW8AEu_Oi.jpg | 2 | cocker_spaniel | 0.771984 | True | miniature_poodle | 0.076653 | True | toy_poodle | 0.039618 | True | cocker_spaniel | 0.771984 |
| 749 | 687818504314159109 | https://pbs.twimg.com/media/CYufR8_WQAAWCqo.jpg | 1 | Lakeland_terrier | 0.873029 | True | soft-coated_wheaten_terrier | 0.060924 | True | toy_poodle | 0.017031 | True | Lakeland_terrier | 0.873029 |
| 1749 | 823699002998870016 | https://pbs.twimg.com/media/C25d3nkXEAAFBUN.jpg | 1 | cairn | 0.203999 | True | snorkel | 0.171893 | False | Norfolk_terrier | 0.107543 | True | cairn | 0.203999 |
| 1898 | 850753642995093505 | https://pbs.twimg.com/media/C8576jrW0AEYWFy.jpg | 1 | pug | 0.996952 | True | bull_mastiff | 0.000996 | True | French_bulldog | 0.000883 | True | pug | 0.996952 |
| 486 | 675497103322386432 | https://pbs.twimg.com/media/CV_ZAhcUkAUeKtZ.jpg | 1 | vizsla | 0.519589 | True | miniature_pinscher | 0.064771 | True | Rhodesian_ridgeback | 0.061491 | True | vizsla | 0.519589 |
| 739 | 687127927494963200 | https://pbs.twimg.com/media/CYkrNIVWcAMswmP.jpg | 1 | pug | 0.178205 | True | Chihuahua | 0.149164 | True | Shih-Tzu | 0.120505 | True | pug | 0.178205 |
| 1790 | 830097400375152640 | https://pbs.twimg.com/media/C4UZLZLWYAA0dcs.jpg | 4 | toy_poodle | 0.442713 | True | Pomeranian | 0.142073 | True | Pekinese | 0.125745 | True | toy_poodle | 0.442713 |
| 45 | 666786068205871104 | https://pbs.twimg.com/media/CUDmZIkWcAAIPPe.jpg | 1 | snail | 0.999888 | False | slug | 0.000055 | False | acorn | 0.000026 | False | NaN | NaN |
# Define: Remove not useful for analysis columns
# Code
columns = image_df_clean.columns[2:-2].to_list()
image_df_clean.drop(columns, axis=1, inplace=True)
# Test
image_df_clean
| tweet_id | jpg_url | dog_type | p_conf | |
|---|---|---|---|---|
| 0 | 666020888022790149 | https://pbs.twimg.com/media/CT4udn0WwAA0aMy.jpg | Welsh_springer_spaniel | 0.465074 |
| 1 | 666029285002620928 | https://pbs.twimg.com/media/CT42GRgUYAA5iDo.jpg | redbone | 0.506826 |
| 2 | 666033412701032449 | https://pbs.twimg.com/media/CT4521TWwAEvMyu.jpg | German_shepherd | 0.596461 |
| 3 | 666044226329800704 | https://pbs.twimg.com/media/CT5Dr8HUEAA-lEu.jpg | Rhodesian_ridgeback | 0.408143 |
| 4 | 666049248165822465 | https://pbs.twimg.com/media/CT5IQmsXIAAKY4A.jpg | miniature_pinscher | 0.560311 |
| ... | ... | ... | ... | ... |
| 2070 | 891327558926688256 | https://pbs.twimg.com/media/DF6hr6BUMAAzZgT.jpg | basset | 0.555712 |
| 2071 | 891689557279858688 | https://pbs.twimg.com/media/DF_q7IAWsAEuuN8.jpg | NaN | NaN |
| 2072 | 891815181378084864 | https://pbs.twimg.com/media/DGBdLU1WsAANxJ9.jpg | Chihuahua | 0.716012 |
| 2073 | 892177421306343426 | https://pbs.twimg.com/media/DGGmoV4XsAAUL6n.jpg | Chihuahua | 0.323581 |
| 2074 | 892420643555336193 | https://pbs.twimg.com/media/DGKD1-bXoAAIAUK.jpg | NaN | NaN |
2009 rows × 4 columns
# Define: Change dog_type dtype column to category
# Code
image_df_clean.dog_type = image_df_clean.dog_type.astype('category')
# Test
image_df_clean.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 2009 entries, 0 to 2074 Data columns (total 4 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 tweet_id 2009 non-null object 1 jpg_url 2009 non-null object 2 dog_type 1638 non-null category 3 p_conf 1638 non-null float64 dtypes: category(1), float64(1), object(2) memory usage: 73.0+ KB
# Define: Final check and reset index
# Code
image_df_clean.reset_index(drop=True, inplace=True)
# Test
image_df_clean
| tweet_id | jpg_url | dog_type | p_conf | |
|---|---|---|---|---|
| 0 | 666020888022790149 | https://pbs.twimg.com/media/CT4udn0WwAA0aMy.jpg | Welsh_springer_spaniel | 0.465074 |
| 1 | 666029285002620928 | https://pbs.twimg.com/media/CT42GRgUYAA5iDo.jpg | redbone | 0.506826 |
| 2 | 666033412701032449 | https://pbs.twimg.com/media/CT4521TWwAEvMyu.jpg | German_shepherd | 0.596461 |
| 3 | 666044226329800704 | https://pbs.twimg.com/media/CT5Dr8HUEAA-lEu.jpg | Rhodesian_ridgeback | 0.408143 |
| 4 | 666049248165822465 | https://pbs.twimg.com/media/CT5IQmsXIAAKY4A.jpg | miniature_pinscher | 0.560311 |
| ... | ... | ... | ... | ... |
| 2004 | 891327558926688256 | https://pbs.twimg.com/media/DF6hr6BUMAAzZgT.jpg | basset | 0.555712 |
| 2005 | 891689557279858688 | https://pbs.twimg.com/media/DF_q7IAWsAEuuN8.jpg | NaN | NaN |
| 2006 | 891815181378084864 | https://pbs.twimg.com/media/DGBdLU1WsAANxJ9.jpg | Chihuahua | 0.716012 |
| 2007 | 892177421306343426 | https://pbs.twimg.com/media/DGGmoV4XsAAUL6n.jpg | Chihuahua | 0.323581 |
| 2008 | 892420643555336193 | https://pbs.twimg.com/media/DGKD1-bXoAAIAUK.jpg | NaN | NaN |
2009 rows × 4 columns
What we will do for this dataframe are:
# Copy the original dataframe first
tweepy_df_clean = tweepy_df.copy()
tweepy_df_clean.sample(10)
| created_at | id | id_str | full_text | truncated | display_text_range | entities | extended_entities | source | in_reply_to_status_id | ... | favorited | retweeted | possibly_sensitive | possibly_sensitive_appealable | lang | retweeted_status | quoted_status_id | quoted_status_id_str | quoted_status_permalink | quoted_status | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 821 | 2016-08-17 01:20:27+00:00 | 765719909049503744 | 765719909049503744 | This is Brat. He has a hard time being ferocio... | False | [0, 115] | {'hashtags': [], 'symbols': [], 'user_mentions... | {'media': [{'id': 765719895086596097, 'id_str'... | <a href="http://twitter.com/download/iphone" r... | NaN | ... | False | False | 0.0 | 0.0 | en | NaN | NaN | NaN | NaN | NaN |
| 643 | 2016-10-23 19:42:02+00:00 | 790277117346975746 | 790277117346975744 | This is Bruce. He never backs down from a chal... | False | [0, 77] | {'hashtags': [], 'symbols': [], 'user_mentions... | {'media': [{'id': 790277108719386624, 'id_str'... | <a href="http://twitter.com/download/iphone" r... | NaN | ... | False | False | 0.0 | 0.0 | en | NaN | NaN | NaN | NaN | NaN |
| 62 | 2017-06-28 00:42:13+00:00 | 879862464715927552 | 879862464715927552 | This is Romeo. He would like to do an entrance... | False | [0, 91] | {'hashtags': [], 'symbols': [], 'user_mentions... | {'media': [{'id': 879862459263307776, 'id_str'... | <a href="http://twitter.com/download/iphone" r... | NaN | ... | False | False | 0.0 | 0.0 | en | NaN | NaN | NaN | NaN | NaN |
| 2177 | 2015-11-23 02:19:29+00:00 | 668614819948453888 | 668614819948453888 | Here is a horned dog. Much grace. Can jump ove... | False | [0, 139] | {'hashtags': [], 'symbols': [], 'user_mentions... | {'media': [{'id': 668614813715664896, 'id_str'... | <a href="http://twitter.com/download/iphone" r... | NaN | ... | False | False | 0.0 | 0.0 | en | NaN | NaN | NaN | NaN | NaN |
| 2003 | 2015-12-01 05:26:34+00:00 | 671561002136281088 | 671561002136281088 | This is the best thing I've ever seen so sprea... | False | [0, 144] | {'hashtags': [], 'symbols': [], 'user_mentions... | {'media': [{'id': 671561000215298048, 'id_str'... | <a href="http://twitter.com/download/iphone" r... | NaN | ... | False | False | 0.0 | 0.0 | en | NaN | NaN | NaN | NaN | NaN |
| 1837 | 2015-12-11 03:05:37+00:00 | 675149409102012420 | 675149409102012416 | holy shit 12/10 https://t.co/p6O8X93bTQ | False | [0, 39] | {'hashtags': [], 'symbols': [], 'user_mentions... | {'media': [{'id': 675149402210701313, 'id_str'... | <a href="http://twitter.com/download/iphone" r... | NaN | ... | False | False | 0.0 | 0.0 | en | NaN | NaN | NaN | NaN | NaN |
| 657 | 2016-10-19 01:29:35+00:00 | 788552643979468800 | 788552643979468800 | RT @dog_rates: Say hello to mad pupper. You kn... | False | [0, 130] | {'hashtags': [], 'symbols': [], 'user_mentions... | NaN | <a href="http://twitter.com/download/iphone" r... | NaN | ... | False | False | 0.0 | 0.0 | en | {'created_at': 'Sat May 28 03:04:00 +0000 2016... | NaN | NaN | NaN | NaN |
| 766 | 2016-09-07 15:44:53+00:00 | 773547596996571136 | 773547596996571136 | This is Chelsea. She forgot how to dog. 11/10 ... | False | [0, 68] | {'hashtags': [], 'symbols': [], 'user_mentions... | {'media': [{'id': 773547591439122432, 'id_str'... | <a href="http://twitter.com/download/iphone" r... | NaN | ... | False | False | 0.0 | 0.0 | en | NaN | NaN | NaN | NaN | NaN |
| 1331 | 2016-02-25 19:04:13+00:00 | 702932127499816960 | 702932127499816960 | This is Chip. He's an Upper West Nile Pantaloo... | False | [0, 137] | {'hashtags': [], 'symbols': [], 'user_mentions... | {'media': [{'id': 702932120042397696, 'id_str'... | <a href="http://twitter.com/download/iphone" r... | NaN | ... | False | False | 0.0 | 0.0 | en | NaN | NaN | NaN | NaN | NaN |
| 1053 | 2016-06-02 16:10:29+00:00 | 738402415918125056 | 738402415918125056 | "Don't talk to me or my son ever again" ...10/... | False | [0, 57] | {'hashtags': [], 'symbols': [], 'user_mentions... | {'media': [{'id': 738402403196796928, 'id_str'... | <a href="http://twitter.com/download/iphone" r... | NaN | ... | False | False | 0.0 | 0.0 | en | NaN | NaN | NaN | NaN | NaN |
10 rows × 32 columns
# Define: Remove id, retweet_count, and favorite_count column
# Code
tweepy_df_clean = tweepy_df_clean[['id', 'retweet_count', 'favorite_count']]
# Test
tweepy_df_clean.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 2322 entries, 0 to 2321 Data columns (total 3 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 id 2322 non-null int64 1 retweet_count 2322 non-null int64 2 favorite_count 2322 non-null int64 dtypes: int64(3) memory usage: 54.5 KB
# Define: Rename id column to tweet_id, then change dtype to object
# Code
tweepy_df_clean = tweepy_df_clean.rename({'id': 'tweet_id'}, axis=1)
tweepy_df_clean.tweet_id = tweepy_df_clean.tweet_id.astype('object')
# Test
tweepy_df_clean.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 2322 entries, 0 to 2321 Data columns (total 3 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 tweet_id 2322 non-null object 1 retweet_count 2322 non-null int64 2 favorite_count 2322 non-null int64 dtypes: int64(2), object(1) memory usage: 54.5+ KB
All dataframe will be merged based on tweet_id as the primary key. The final dataframe will be inner-joined. Then, after final checking, we will save the dataframe to CSV file, named 'twitter_archive_master.csv'.
# Define: Join all three dataframe using .merge() method
# Code
twitter_archive_master = archive_df_clean.merge(image_df_clean,on='tweet_id').merge(tweepy_df_clean,on='tweet_id')
twitter_archive_master.reset_index(drop=True, inplace=True)
# Test
twitter_archive_master
| tweet_id | timestamp | text | rating_numerator | rating_denominator | name | dog_stage | jpg_url | dog_type | p_conf | retweet_count | favorite_count | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 892420643555336193 | 2017-08-01 16:23:56 | This is Phineas. He's a mystical boy. Only eve... | 13.0 | 10.0 | Phineas | NaN | https://pbs.twimg.com/media/DGKD1-bXoAAIAUK.jpg | NaN | NaN | 7604 | 35884 |
| 1 | 892177421306343426 | 2017-08-01 00:17:27 | This is Tilly. She's just checking pup on you.... | 13.0 | 10.0 | Tilly | NaN | https://pbs.twimg.com/media/DGGmoV4XsAAUL6n.jpg | Chihuahua | 0.323581 | 5631 | 30943 |
| 2 | 891815181378084864 | 2017-07-31 00:18:03 | This is Archie. He is a rare Norwegian Pouncin... | 12.0 | 10.0 | Archie | NaN | https://pbs.twimg.com/media/DGBdLU1WsAANxJ9.jpg | Chihuahua | 0.716012 | 3726 | 23295 |
| 3 | 891689557279858688 | 2017-07-30 15:58:51 | This is Darla. She commenced a snooze mid meal... | 13.0 | 10.0 | Darla | NaN | https://pbs.twimg.com/media/DF_q7IAWsAEuuN8.jpg | NaN | NaN | 7773 | 39140 |
| 4 | 891327558926688256 | 2017-07-29 16:00:24 | This is Franklin. He would like you to stop ca... | 12.0 | 10.0 | Franklin | NaN | https://pbs.twimg.com/media/DF6hr6BUMAAzZgT.jpg | basset | 0.555712 | 8378 | 37390 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 1974 | 666049248165822465 | 2015-11-16 00:24:50 | Here we have a 1949 1st generation vulpix. Enj... | 5.0 | 10.0 | NaN | NaN | https://pbs.twimg.com/media/CT5IQmsXIAAKY4A.jpg | miniature_pinscher | 0.560311 | 40 | 96 |
| 1975 | 666044226329800704 | 2015-11-16 00:04:52 | This is a purebred Piers Morgan. Loves to Netf... | 6.0 | 10.0 | a | NaN | https://pbs.twimg.com/media/CT5Dr8HUEAA-lEu.jpg | Rhodesian_ridgeback | 0.408143 | 130 | 269 |
| 1976 | 666033412701032449 | 2015-11-15 23:21:54 | Here is a very happy pup. Big fan of well-main... | 9.0 | 10.0 | a | NaN | https://pbs.twimg.com/media/CT4521TWwAEvMyu.jpg | German_shepherd | 0.596461 | 41 | 111 |
| 1977 | 666029285002620928 | 2015-11-15 23:05:30 | This is a western brown Mitsubishi terrier. Up... | 7.0 | 10.0 | a | NaN | https://pbs.twimg.com/media/CT42GRgUYAA5iDo.jpg | redbone | 0.506826 | 42 | 120 |
| 1978 | 666020888022790149 | 2015-11-15 22:32:08 | Here we have a Japanese Irish Setter. Lost eye... | 8.0 | 10.0 | NaN | NaN | https://pbs.twimg.com/media/CT4udn0WwAA0aMy.jpg | Welsh_springer_spaniel | 0.465074 | 459 | 2388 |
1979 rows × 12 columns
# Test, check dtypes
twitter_archive_master.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 1979 entries, 0 to 1978 Data columns (total 12 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 tweet_id 1979 non-null object 1 timestamp 1979 non-null datetime64[ns] 2 text 1979 non-null object 3 rating_numerator 1979 non-null float64 4 rating_denominator 1979 non-null float64 5 name 1436 non-null object 6 dog_stage 301 non-null category 7 jpg_url 1979 non-null object 8 dog_type 1619 non-null category 9 p_conf 1619 non-null float64 10 retweet_count 1979 non-null int64 11 favorite_count 1979 non-null int64 dtypes: category(2), datetime64[ns](1), float64(3), int64(2), object(4) memory usage: 167.0+ KB
# Define: Save complete dataframe into CSV file
# Code
twitter_archive_master.to_csv('twitter_archive_master.csv', index=False)
# Test
os.path.isfile('./twitter_archive_master.csv')
True
twitter_archive_master
| tweet_id | timestamp | text | rating_numerator | rating_denominator | name | dog_stage | jpg_url | dog_type | p_conf | retweet_count | favorite_count | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 892420643555336193 | 2017-08-01 16:23:56 | This is Phineas. He's a mystical boy. Only eve... | 13.0 | 10.0 | Phineas | NaN | https://pbs.twimg.com/media/DGKD1-bXoAAIAUK.jpg | NaN | NaN | 7604 | 35884 |
| 1 | 892177421306343426 | 2017-08-01 00:17:27 | This is Tilly. She's just checking pup on you.... | 13.0 | 10.0 | Tilly | NaN | https://pbs.twimg.com/media/DGGmoV4XsAAUL6n.jpg | Chihuahua | 0.323581 | 5631 | 30943 |
| 2 | 891815181378084864 | 2017-07-31 00:18:03 | This is Archie. He is a rare Norwegian Pouncin... | 12.0 | 10.0 | Archie | NaN | https://pbs.twimg.com/media/DGBdLU1WsAANxJ9.jpg | Chihuahua | 0.716012 | 3726 | 23295 |
| 3 | 891689557279858688 | 2017-07-30 15:58:51 | This is Darla. She commenced a snooze mid meal... | 13.0 | 10.0 | Darla | NaN | https://pbs.twimg.com/media/DF_q7IAWsAEuuN8.jpg | NaN | NaN | 7773 | 39140 |
| 4 | 891327558926688256 | 2017-07-29 16:00:24 | This is Franklin. He would like you to stop ca... | 12.0 | 10.0 | Franklin | NaN | https://pbs.twimg.com/media/DF6hr6BUMAAzZgT.jpg | basset | 0.555712 | 8378 | 37390 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 1974 | 666049248165822465 | 2015-11-16 00:24:50 | Here we have a 1949 1st generation vulpix. Enj... | 5.0 | 10.0 | NaN | NaN | https://pbs.twimg.com/media/CT5IQmsXIAAKY4A.jpg | miniature_pinscher | 0.560311 | 40 | 96 |
| 1975 | 666044226329800704 | 2015-11-16 00:04:52 | This is a purebred Piers Morgan. Loves to Netf... | 6.0 | 10.0 | a | NaN | https://pbs.twimg.com/media/CT5Dr8HUEAA-lEu.jpg | Rhodesian_ridgeback | 0.408143 | 130 | 269 |
| 1976 | 666033412701032449 | 2015-11-15 23:21:54 | Here is a very happy pup. Big fan of well-main... | 9.0 | 10.0 | a | NaN | https://pbs.twimg.com/media/CT4521TWwAEvMyu.jpg | German_shepherd | 0.596461 | 41 | 111 |
| 1977 | 666029285002620928 | 2015-11-15 23:05:30 | This is a western brown Mitsubishi terrier. Up... | 7.0 | 10.0 | a | NaN | https://pbs.twimg.com/media/CT42GRgUYAA5iDo.jpg | redbone | 0.506826 | 42 | 120 |
| 1978 | 666020888022790149 | 2015-11-15 22:32:08 | Here we have a Japanese Irish Setter. Lost eye... | 8.0 | 10.0 | NaN | NaN | https://pbs.twimg.com/media/CT4udn0WwAA0aMy.jpg | Welsh_springer_spaniel | 0.465074 | 459 | 2388 |
1979 rows × 12 columns
twitter_archive_master.name.value_counts()
a 55
Charlie 10
Oliver 10
Cooper 10
Lucy 9
..
Joey 1
Evy 1
Bloop 1
Shadoe 1
Sailer 1
Name: name, Length: 930, dtype: int64
twitter_archive_master.name.value_counts().head(10).plot(kind='barh')
plt.title('Dog Name Count')
plt.xlabel('Name Count')
plt.ylabel('Dog Name');
named_a = twitter_archive_master.index[twitter_archive_master.name == 'a']
for s in named_a:
print(s, "\t", twitter_archive_master['text'][s])
49 Here is a pupper approaching maximum borkdrive. Zooming at never before seen speeds. 14/10 paw-inspiring af (IG: puffie_the_chow) https://t.co/ghXBIIeQZF 462 Here is a perfect example of someone who has their priorities in order. 13/10 for both owner and Forrest https://t.co/LRyMrU7Wfq 571 Guys this is getting so out of hand. We only rate dogs. This is a Galapagos Speed Panda. Pls only send dogs... 10/10 https://t.co/8lpAGaZRFn 734 This is a mighty rare blue-tailed hammer sherk. Human almost lost a limb trying to take these. Be careful guys. 8/10 https://t.co/TGenMeXreW 736 Viewer discretion is advised. This is a terrible attack in progress. Not even in water (tragic af). 4/10 bad sherk https://t.co/L3U0j14N5R 745 This is a carrot. We only rate dogs. Please only send in dogs. You all really should know this by now ...11/10 https://t.co/9e48aPrBm2 771 This is a very rare Great Alaskan Bush Pupper. Hard to stumble upon without spooking. 12/10 would pet passionately https://t.co/xOBKCdpzaa 907 People please. This is a Deadly Mediterranean Plop T-Rex. We only rate dogs. Only send in dogs. Thanks you... 11/10 https://t.co/2ATDsgHD4n 917 This is a taco. We only rate dogs. Please only send in dogs. Dogs are what we rate. Not tacos. Thank you... 10/10 https://t.co/cxl6xGY8B9 1032 Here is a heartbreaking scene of an incredible pupper being laid to rest. 10/10 RIP pupper https://t.co/81mvJ0rGRu 1041 Here is a whole flock of puppers. 60/50 I'll take the lot https://t.co/9dpcw6MdWa 1051 This is a Butternut Cumberfloof. It's not windy they just look like that. 11/10 back at it again with the red socks https://t.co/hMjzhdUHaW 1057 This is a Wild Tuscan Poofwiggle. Careful not to startle. Rare tongue slip. One eye magical. 12/10 would def pet https://t.co/4EnShAQjv6 1069 "Pupper is a present to world. Here is a bow for pupper." 12/10 precious as hell https://t.co/ItSsE92gCW 1172 This is a rare Arctic Wubberfloof. Unamused by the happenings. No longer has the appetites. 12/10 would totally hug https://t.co/krvbacIX0N 1384 Guys this really needs to stop. We've been over this way too many times. This is a giraffe. We only rate dogs.. 7/10 https://t.co/yavgkHYPOC 1427 This is a dog swinging. I really enjoyed it so I hope you all do as well. 11/10 https://t.co/Ozo9KHTRND 1489 This is a Sizzlin Menorah spaniel from Brooklyn named Wylie. Lovable eyes. Chiller as hell. 10/10 and I'm out.. poof https://t.co/7E0AiJXPmI 1490 Seriously guys?! Only send in dogs. I only rate dogs. This is a baby black bear... 11/10 https://t.co/H7kpabTfLj 1513 C'mon guys. We've been over this. We only rate dogs. This is a cow. Please only submit dogs. Thank you...... 9/10 https://t.co/WjcELNEqN2 1514 This is a fluffy albino Bacardi Columbia mix. Excellent at the tweets. 11/10 would hug gently https://t.co/diboDRUuEI 1555 This is a Sagitariot Baklava mix. Loves her new hat. 11/10 radiant pup https://t.co/Bko5kFJYUU 1572 This is a heavily opinionated dog. Loves walls. Nobody knows how the hair works. Always ready for a kiss. 4/10 https://t.co/dFiaKZ9cDl 1586 This is a Lofted Aphrodisiac Terrier named Kip. Big fan of bed n breakfasts. Fits perfectly. 10/10 would pet firmly https://t.co/gKlLpNzIl3 1624 This is a baby Rand Paul. Curls for days. 11/10 would cuddle the hell out of https://t.co/xHXNaPAYRe 1664 This is a Tuscaloosa Alcatraz named Jacob (Yacōb). Loves to sit in swing. Stellar tongue. 11/10 look at his feet https://t.co/2IslQ8ZSc7 1695 This is a Helvetica Listerine named Rufus. This time Rufus will be ready for the UPS guy. He'll never expect it 9/10 https://t.co/34OhVhMkVr 1745 This is a Deciduous Trimester mix named Spork. Only 1 ear works. No seat belt. Incredibly reckless. 9/10 still cute https://t.co/CtuJoLHiDo 1754 This is a Rich Mahogany Seltzer named Cherokee. Just got destroyed by a snowball. Isn't very happy about it. 9/10 https://t.co/98ZBi6o4dj 1757 This is a Speckled Cauliflower Yosemite named Hemry. He's terrified of intruder dog. Not one bit comfortable. 9/10 https://t.co/yV3Qgjh8iN 1775 This is a spotted Lipitor Rumpelstiltskin named Alphred. He can't wait for the Turkey. 10/10 would pet really well https://t.co/6GUGO7azNX 1781 This is a brave dog. Excellent free climber. Trying to get closer to God. Not very loyal though. Doesn't bark. 5/10 https://t.co/ODnILTr4QM 1789 This is a Coriander Baton Rouge named Alfredo. Loves to cuddle with smaller well-dressed dog. 10/10 would hug lots https://t.co/eCRdwouKCl 1818 This is a Slovakian Helter Skelter Feta named Leroi. Likes to skip on roofs. Good traction. Much balance. 10/10 wow! https://t.co/Dmy2mY2Qj5 1825 This is a wild Toblerone from Papua New Guinea. Mouth always open. Addicted to hay. Acts blind. 7/10 handsome dog https://t.co/IGmVbz07tZ 1838 Here is a horned dog. Much grace. Can jump over moons (dam!). Paws not soft. Bad at barking. 7/10 can still pet tho https://t.co/2Su7gmsnZm 1844 This is a Birmingham Quagmire named Chuk. Loves to relax and watch the game while sippin on that iced mocha. 10/10 https://t.co/HvNg9JWxFt 1848 Here is a mother dog caring for her pups. Snazzy red mohawk. Doesn't wag tail. Pups look confused. Overall 4/10 https://t.co/YOHe6lf09m 1861 This is a Trans Siberian Kellogg named Alfonso. Huge ass eyeballs. Actually Dobby from Harry Potter. 7/10 https://t.co/XpseHBlAAb 1875 This is a Shotokon Macadamia mix named Cheryl. Sophisticated af. Looks like a disappointed librarian. Shh (lol) 9/10 https://t.co/J4GnJ5Swba 1881 This is a rare Hungarian Pinot named Jessiga. She is either mid-stroke or got stuck in the washing machine. 8/10 https://t.co/ZU0i0KJyqD 1888 This is a southwest Coriander named Klint. Hat looks expensive. Still on house arrest :( 9/10 https://t.co/IQTOMqDUIe 1897 This is a northern Wahoo named Kohl. He runs this town. Chases tumbleweeds. Draws gun wicked fast. 11/10 legendary https://t.co/J4vn2rOYFk 1911 This is a Dasani Kingfisher from Maine. His name is Daryl. Daryl doesn't like being swallowed by a panda. 8/10 https://t.co/jpaeu6LNmW 1927 This is a curly Ticonderoga named Pepe. No feet. Loves to jet ski. 11/10 would hug until forever https://t.co/cyDfaK8NBc 1934 This is a purebred Bacardi named Octaviath. Can shoot spaghetti out of mouth. 10/10 https://t.co/uEvsGLOFHa 1937 This is a golden Buckminsterfullerene named Johm. Drives trucks. Lumberjack (?). Enjoys wall. 8/10 would hug softly https://t.co/uQbZJM2DQB 1950 This is a southern Vesuvius bumblegruff. Can drive a truck (wow). Made friends with 5 other nifty dogs (neat). 7/10 https://t.co/LopTBkKa8h 1957 This is a funny dog. Weird toes. Won't come down. Loves branch. Refuses to eat his food. Hard to cuddle with. 3/10 https://t.co/IIXis0zta0 1970 My oh my. This is a rare blond Canadian terrier on wheels. Only $8.98. Rather docile. 9/10 very rare https://t.co/yWBqbrzy8O 1971 Here is a Siberian heavily armored polar bear mix. Strong owner. 10/10 I would do unspeakable things to pet this dog https://t.co/rdivxLiqEt 1973 This is a truly beautiful English Wilson Staff retriever. Has a nice phone. Privileged. 10/10 would trade lives with https://t.co/fvIbQfHjIe 1975 This is a purebred Piers Morgan. Loves to Netflix and chill. Always looks like he forgot to unplug the iron. 6/10 https://t.co/DWnyCjf2mx 1976 Here is a very happy pup. Big fan of well-maintained decks. Just look at that tongue. 9/10 would cuddle af https://t.co/y671yMhoiR 1977 This is a western brown Mitsubishi terrier. Upset about leaf. Actually 2 dogs here. 7/10 would walk the shit out of https://t.co/r7mOb2m0UI
Dogs has varies name given by it's owner. This is kind of interesting, from the detection, people tends not to share their dog name to the WeRateDogs users. Usually people only share only it's stage or type in the Twitter.
For the most common name for dog posted is Oliver, Cooper, and Charlie, each with count 10.
twitter_archive_master.dog_type.value_counts()
golden_retriever 147
Labrador_retriever 98
Pembroke 93
Chihuahua 84
pug 55
...
Japanese_spaniel 1
loggerhead 1
maillot 1
mink 1
wood_rabbit 1
Name: dog_type, Length: 164, dtype: int64
twitter_archive_master.dog_type.value_counts().head(10).plot(kind='barh')
plt.title('Dog Type Post Count')
plt.xlabel('Post Count')
plt.ylabel('Dog Type');
golden_retriever = twitter_archive_master[twitter_archive_master['dog_type'] == 'golden_retriever']['jpg_url'].values[0]
response = requests.get(golden_retriever)
print('One of the most popular dog')
Image.open(BytesIO(response.content))
One of the most popular dog
counts = ['retweet_count', 'favorite_count']
sum_count = twitter_archive_master.groupby(['dog_type'])['retweet_count', 'favorite_count'].sum().sort_values(by=counts, ascending=False)
sum_count
<ipython-input-65-2aafc70c5a2a>:2: FutureWarning: Indexing with multiple keys (implicitly converted to a tuple of keys) will be deprecated, use a list instead. sum_count = twitter_archive_master.groupby(['dog_type'])['retweet_count', 'favorite_count'].sum().sort_values(by=counts, ascending=False)
| retweet_count | favorite_count | |
|---|---|---|
| dog_type | ||
| golden_retriever | 483833 | 1681869 |
| Labrador_retriever | 322538 | 1023817 |
| Pembroke | 251702 | 942916 |
| Chihuahua | 199845 | 641301 |
| Samoyed | 158874 | 480634 |
| ... | ... | ... |
| groenendael | 363 | 1727 |
| corn | 342 | 1052 |
| hyena | 273 | 1285 |
| indri | 192 | 523 |
| hair_spray | 79 | 310 |
164 rows × 2 columns
The most common type in WeRateDogs is Golden Retriever and it has the most retweet count and favorite count among the all.
But for the average of retweet and favourite count, the most count is House Finch. The Golden Retriever event not in top 10 of the list.
mean_count = twitter_archive_master.groupby(['dog_type'])['retweet_count', 'favorite_count'].mean().sort_values(by=counts, ascending=False)
mean_count.head(10)
<ipython-input-66-20b93dbab5ee>:1: FutureWarning: Indexing with multiple keys (implicitly converted to a tuple of keys) will be deprecated, use a list instead. mean_count = twitter_archive_master.groupby(['dog_type'])['retweet_count', 'favorite_count'].mean().sort_values(by=counts, ascending=False)
| retweet_count | favorite_count | |
|---|---|---|
| dog_type | ||
| house_finch | 35006.000000 | 75477.000000 |
| leafhopper | 30004.000000 | 74161.000000 |
| oscilloscope | 12614.000000 | 27701.000000 |
| Bedlington_terrier | 7225.500000 | 22790.833333 |
| standard_poodle | 5200.625000 | 13054.250000 |
| Afghan_hound | 5156.666667 | 15630.000000 |
| Eskimo_dog | 4772.578947 | 13361.894737 |
| English_springer | 4725.300000 | 12878.000000 |
| academic_gown | 4593.000000 | 19207.000000 |
| Saluki | 4459.250000 | 22022.000000 |
# First, we need make new column, which is rating for each post
numerator = twitter_archive_master.rating_numerator
denominator = twitter_archive_master.rating_denominator
twitter_archive_master['rating'] = numerator / denominator
twitter_archive_master
| tweet_id | timestamp | text | rating_numerator | rating_denominator | name | dog_stage | jpg_url | dog_type | p_conf | retweet_count | favorite_count | rating | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 892420643555336193 | 2017-08-01 16:23:56 | This is Phineas. He's a mystical boy. Only eve... | 13.0 | 10.0 | Phineas | NaN | https://pbs.twimg.com/media/DGKD1-bXoAAIAUK.jpg | NaN | NaN | 7604 | 35884 | 1.3 |
| 1 | 892177421306343426 | 2017-08-01 00:17:27 | This is Tilly. She's just checking pup on you.... | 13.0 | 10.0 | Tilly | NaN | https://pbs.twimg.com/media/DGGmoV4XsAAUL6n.jpg | Chihuahua | 0.323581 | 5631 | 30943 | 1.3 |
| 2 | 891815181378084864 | 2017-07-31 00:18:03 | This is Archie. He is a rare Norwegian Pouncin... | 12.0 | 10.0 | Archie | NaN | https://pbs.twimg.com/media/DGBdLU1WsAANxJ9.jpg | Chihuahua | 0.716012 | 3726 | 23295 | 1.2 |
| 3 | 891689557279858688 | 2017-07-30 15:58:51 | This is Darla. She commenced a snooze mid meal... | 13.0 | 10.0 | Darla | NaN | https://pbs.twimg.com/media/DF_q7IAWsAEuuN8.jpg | NaN | NaN | 7773 | 39140 | 1.3 |
| 4 | 891327558926688256 | 2017-07-29 16:00:24 | This is Franklin. He would like you to stop ca... | 12.0 | 10.0 | Franklin | NaN | https://pbs.twimg.com/media/DF6hr6BUMAAzZgT.jpg | basset | 0.555712 | 8378 | 37390 | 1.2 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 1974 | 666049248165822465 | 2015-11-16 00:24:50 | Here we have a 1949 1st generation vulpix. Enj... | 5.0 | 10.0 | NaN | NaN | https://pbs.twimg.com/media/CT5IQmsXIAAKY4A.jpg | miniature_pinscher | 0.560311 | 40 | 96 | 0.5 |
| 1975 | 666044226329800704 | 2015-11-16 00:04:52 | This is a purebred Piers Morgan. Loves to Netf... | 6.0 | 10.0 | a | NaN | https://pbs.twimg.com/media/CT5Dr8HUEAA-lEu.jpg | Rhodesian_ridgeback | 0.408143 | 130 | 269 | 0.6 |
| 1976 | 666033412701032449 | 2015-11-15 23:21:54 | Here is a very happy pup. Big fan of well-main... | 9.0 | 10.0 | a | NaN | https://pbs.twimg.com/media/CT4521TWwAEvMyu.jpg | German_shepherd | 0.596461 | 41 | 111 | 0.9 |
| 1977 | 666029285002620928 | 2015-11-15 23:05:30 | This is a western brown Mitsubishi terrier. Up... | 7.0 | 10.0 | a | NaN | https://pbs.twimg.com/media/CT42GRgUYAA5iDo.jpg | redbone | 0.506826 | 42 | 120 | 0.7 |
| 1978 | 666020888022790149 | 2015-11-15 22:32:08 | Here we have a Japanese Irish Setter. Lost eye... | 8.0 | 10.0 | NaN | NaN | https://pbs.twimg.com/media/CT4udn0WwAA0aMy.jpg | Welsh_springer_spaniel | 0.465074 | 459 | 2388 | 0.8 |
1979 rows × 13 columns
rating = twitter_archive_master.groupby(['dog_type']).sum().sort_values(by=['rating'], ascending=False)
rating[['rating_numerator', 'rating_denominator', 'rating']]
| rating_numerator | rating_denominator | rating | |
|---|---|---|---|
| dog_type | |||
| golden_retriever | 1942.5 | 1668.0 | 173.196753 |
| Labrador_retriever | 1352.0 | 1220.0 | 108.800000 |
| Pembroke | 1059.0 | 930.0 | 105.900000 |
| Chihuahua | 899.0 | 840.0 | 89.900000 |
| pug | 565.0 | 550.0 | 56.500000 |
| ... | ... | ... | ... |
| mosquito_net | 8.0 | 10.0 | 0.800000 |
| ram | 7.0 | 10.0 | 0.700000 |
| sunglasses | 6.0 | 10.0 | 0.600000 |
| Japanese_spaniel | 5.0 | 10.0 | 0.500000 |
| loggerhead | 3.0 | 10.0 | 0.300000 |
164 rows × 3 columns
loggerhead = twitter_archive_master[twitter_archive_master['dog_type'] == 'loggerhead']['jpg_url'].values[0]
response = requests.get(loggerhead)
print('One of the least rated dog')
Image.open(BytesIO(response.content))
One of the least rated dog
print(f"The most rated dog is {rating.iloc[0].name} with rate {rating.iloc[0]['rating']}")
print(f"The lowest rated dog is {rating.iloc[-1].name} with rate {rating.iloc[-1]['rating']}")
The most rated dog is golden_retriever with rate 173.19675324675308 The lowest rated dog is loggerhead with rate 0.3
twitter_archive_master.sort_values(by='rating', ascending=False)
| tweet_id | timestamp | text | rating_numerator | rating_denominator | name | dog_stage | jpg_url | dog_type | p_conf | retweet_count | favorite_count | rating | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 714 | 749981277374128128 | 2016-07-04 15:00:45 | This is Atticus. He's quite simply America af.... | 1776.0 | 10.0 | Atticus | NaN | https://pbs.twimg.com/media/CmgBZ7kWcAAlzFD.jpg | NaN | NaN | 2444 | 5090 | 177.600000 |
| 1703 | 670842764863651840 | 2015-11-29 05:52:33 | After so many requests... here you go.\n\nGood... | 420.0 | 10.0 | NaN | NaN | https://pbs.twimg.com/media/CU9P717W4AAOlKx.jpg | NaN | NaN | 8210 | 23516 | 42.000000 |
| 376 | 810984652412424192 | 2016-12-19 23:06:23 | Meet Sam. She smiles 24/7 & secretly aspir... | 24.0 | 7.0 | Sam | NaN | https://pbs.twimg.com/media/C0EyPZbXAAAceSc.jpg | golden_retriever | 0.871342 | 1452 | 5384 | 3.428571 |
| 325 | 819004803107983360 | 2017-01-11 02:15:36 | This is Bo. He was a very good First Doggo. 14... | 14.0 | 10.0 | Bo | doggo | https://pbs.twimg.com/media/C12whDoVEAALRxa.jpg | standard_poodle | 0.351308 | 37054 | 87338 | 1.400000 |
| 1267 | 685547936038666240 | 2016-01-08 19:45:39 | Everybody needs to read this. Jack is our firs... | 14.0 | 10.0 | NaN | pupper | https://pbs.twimg.com/media/CYOONfZW8AA7IOA.jpg | NaN | NaN | 15366 | 32487 | 1.400000 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 1885 | 667549055577362432 | 2015-11-20 03:44:31 | Never seen dog like this. Breathes heavy. Tilt... | 1.0 | 10.0 | NaN | NaN | https://pbs.twimg.com/media/CUOcVCwWsAERUKY.jpg | NaN | NaN | 2113 | 5485 | 0.100000 |
| 1505 | 675153376133427200 | 2015-12-11 03:21:23 | What kind of person sends in a picture without... | 1.0 | 10.0 | NaN | NaN | https://pbs.twimg.com/media/CV6gaUUWEAAnETq.jpg | NaN | NaN | 2471 | 6006 | 0.100000 |
| 1720 | 670783437142401025 | 2015-11-29 01:56:48 | Flamboyant pup here. Probably poisonous. Won't... | 1.0 | 10.0 | NaN | NaN | https://pbs.twimg.com/media/CU8Z-OxXAAA-sd2.jpg | NaN | NaN | 362 | 786 | 0.100000 |
| 744 | 746906459439529985 | 2016-06-26 03:22:31 | PUPDATE: can't see any. Even if I could, I cou... | 0.0 | 10.0 | NaN | NaN | https://pbs.twimg.com/media/Cl2LdofXEAATl7x.jpg | NaN | NaN | 293 | 2874 | 0.000000 |
| 230 | 835152434251116546 | 2017-02-24 15:40:31 | When you're so blinded by your systematic plag... | 0.0 | 10.0 | NaN | NaN | https://pbs.twimg.com/media/C5cOtWVWMAEjO5p.jpg | American_Staffordshire_terrier | 0.012731 | 2987 | 22268 | 0.000000 |
1979 rows × 13 columns
avg_rating = twitter_archive_master.groupby(['dog_type']).mean().sort_values(by=['rating'], ascending=False)
avg_rating['avg_rating'] = avg_rating['rating']
avg_rating[['rating_numerator', 'rating_denominator', 'avg_rating']]
| rating_numerator | rating_denominator | avg_rating | |
|---|---|---|---|
| dog_type | |||
| racket | 13.0 | 10.0 | 1.3 |
| paddle | 13.0 | 10.0 | 1.3 |
| timber_wolf | 13.0 | 10.0 | 1.3 |
| house_finch | 13.0 | 10.0 | 1.3 |
| oxygen_mask | 13.0 | 10.0 | 1.3 |
| ... | ... | ... | ... |
| plow | 8.0 | 10.0 | 0.8 |
| ram | 7.0 | 10.0 | 0.7 |
| sunglasses | 6.0 | 10.0 | 0.6 |
| Japanese_spaniel | 5.0 | 10.0 | 0.5 |
| loggerhead | 3.0 | 10.0 | 0.3 |
164 rows × 3 columns
print(f"The most average rated dog is {avg_rating.iloc[0].name} with average rate {avg_rating.iloc[0]['rating']}")
print(f"The lowest average rated dog is {avg_rating.iloc[-1].name} with average rate {avg_rating.iloc[-1]['rating']}")
The most average rated dog is racket with average rate 1.3 The lowest average rated dog is loggerhead with average rate 0.3
clumber = twitter_archive_master[twitter_archive_master['dog_type'] == 'clumber']['jpg_url'].values[0]
response = requests.get(clumber)
print('One of the least rated dog')
print(twitter_archive_master[twitter_archive_master['dog_type'] == 'clumber']['name'].values[0])
Image.open(BytesIO(response.content))
One of the least rated dog Sophie
twitter_archive_master.corr()
| rating_numerator | rating_denominator | p_conf | retweet_count | favorite_count | rating | |
|---|---|---|---|---|---|---|
| rating_numerator | 1.000000 | 0.198444 | 0.017734 | 0.018127 | 0.015940 | 0.979811 |
| rating_denominator | 0.198444 | 1.000000 | -0.013553 | -0.020164 | -0.027449 | -0.001055 |
| p_conf | 0.017734 | -0.013553 | 1.000000 | 0.032842 | 0.065554 | 0.136530 |
| retweet_count | 0.018127 | -0.020164 | 0.032842 | 1.000000 | 0.925425 | 0.022508 |
| favorite_count | 0.015940 | -0.027449 | 0.065554 | 0.925425 | 1.000000 | 0.021719 |
| rating | 0.979811 | -0.001055 | 0.136530 | 0.022508 | 0.021719 | 1.000000 |
print(twitter_archive_master.retweet_count.corr(twitter_archive_master.favorite_count))
sns.regplot(twitter_archive_master.retweet_count, twitter_archive_master.favorite_count);
0.9254252213316151
twitter_archive_master.retweet_count.corr(twitter_archive_master.favorite_count)
0.9254252213316151
From the table and regression plot above, retweet_count and favorite_count have strong positive correlation.